CASE STUDY - 1 :: Healthcare Provider Fraudulent Detection¶
This notebook contains the extensive data analysis that been carried out on the publicly available dataset at Kaggle with the intent:
Kindly checkout below link for gaining BUSINESS related insights about this problem:
Kindly checkout below link for TECHNICAL description about this problem:
Notebook Contents¶CASE STUDY - 1 :: Healthcare Provider Fraudulent Detection
Q1. How many unique beneficiaries we have in our dataset?
Q2. How many records we have at the GENDER level?
Q3. Lets calculate the AGE of every BENEFICIARY?
Q4. Lets see the ratio of GENDER across various HUMAN RACE?
Q5. Lets see the number of beneficiaries with Chronic Renal Disease.
Q6. Lets see the number of beneficiaries on the basis of State Codes.
Q7. Lets see the number of beneficiaries on the basis of Country Codes.
Q8.1 Lets see the number of beneficiaries on the basis of 'NoOfMonths_PartACov'.
Q8.2 Lets see the number of beneficiaries on the basis of 'NoOfMonths_PartBCov'.
## TRAIN set files
!gdown 12zSQN2FOxmuXFhz2xzPNussPisEfVP5w
!gdown 13XyBakfHiG-BNQPrYFXAHlsOcfICOTpx
!gdown 1dLxl4vkykPcm4Zj0abYR0Ohr7STQHg-1
!gdown 1rFER-7VuYb7GfCYeJrfxPidgK0lwqw3R
import os
import sys
import math
import scipy as scipy
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
pd.set_option('display.max_columns',30)
label_font_dict = {'family':'sans-serif','size':13.5,'color':'brown','style':'italic'}
title_font_dict = {'family':'sans-serif','size':16.5,'color':'Blue','style':'italic'}
train_bene_df = pd.read_csv("Dataset\Train\Train_Beneficiarydata-1542865627584.csv")
train_ip_df = pd.read_csv("Dataset\Train\Train_Inpatientdata-1542865627584.csv")
train_op_df = pd.read_csv("Dataset\Train\Train_Outpatientdata-1542865627584.csv")
train_bene_df.shape
(138556, 25)
train_bene_df.columns
Index(['BeneID', 'DOB', 'DOD', 'Gender', 'Race', 'RenalDiseaseIndicator',
'State', 'County', 'NoOfMonths_PartACov', 'NoOfMonths_PartBCov',
'ChronicCond_Alzheimer', 'ChronicCond_Heartfailure',
'ChronicCond_KidneyDisease', 'ChronicCond_Cancer',
'ChronicCond_ObstrPulmonary', 'ChronicCond_Depression',
'ChronicCond_Diabetes', 'ChronicCond_IschemicHeart',
'ChronicCond_Osteoporasis', 'ChronicCond_rheumatoidarthritis',
'ChronicCond_stroke', 'IPAnnualReimbursementAmt',
'IPAnnualDeductibleAmt', 'OPAnnualReimbursementAmt',
'OPAnnualDeductibleAmt'],
dtype='object')
train_bene_df.dtypes
BeneID object DOB object DOD object Gender int64 Race int64 RenalDiseaseIndicator object State int64 County int64 NoOfMonths_PartACov int64 NoOfMonths_PartBCov int64 ChronicCond_Alzheimer int64 ChronicCond_Heartfailure int64 ChronicCond_KidneyDisease int64 ChronicCond_Cancer int64 ChronicCond_ObstrPulmonary int64 ChronicCond_Depression int64 ChronicCond_Diabetes int64 ChronicCond_IschemicHeart int64 ChronicCond_Osteoporasis int64 ChronicCond_rheumatoidarthritis int64 ChronicCond_stroke int64 IPAnnualReimbursementAmt int64 IPAnnualDeductibleAmt int64 OPAnnualReimbursementAmt int64 OPAnnualDeductibleAmt int64 dtype: object
train_bene_df.head()
| BeneID | DOB | DOD | Gender | Race | RenalDiseaseIndicator | State | County | NoOfMonths_PartACov | NoOfMonths_PartBCov | ChronicCond_Alzheimer | ChronicCond_Heartfailure | ChronicCond_KidneyDisease | ChronicCond_Cancer | ChronicCond_ObstrPulmonary | ChronicCond_Depression | ChronicCond_Diabetes | ChronicCond_IschemicHeart | ChronicCond_Osteoporasis | ChronicCond_rheumatoidarthritis | ChronicCond_stroke | IPAnnualReimbursementAmt | IPAnnualDeductibleAmt | OPAnnualReimbursementAmt | OPAnnualDeductibleAmt | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | BENE11001 | 1943-01-01 | NaN | 1 | 1 | 0 | 39 | 230 | 12 | 12 | 1 | 2 | 1 | 2 | 2 | 1 | 1 | 1 | 2 | 1 | 1 | 36000 | 3204 | 60 | 70 |
| 1 | BENE11002 | 1936-09-01 | NaN | 2 | 1 | 0 | 39 | 280 | 12 | 12 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 0 | 0 | 30 | 50 |
| 2 | BENE11003 | 1936-08-01 | NaN | 1 | 1 | 0 | 52 | 590 | 12 | 12 | 1 | 2 | 2 | 2 | 2 | 2 | 2 | 1 | 2 | 2 | 2 | 0 | 0 | 90 | 40 |
| 3 | BENE11004 | 1922-07-01 | NaN | 1 | 1 | 0 | 39 | 270 | 12 | 12 | 1 | 1 | 2 | 2 | 2 | 2 | 1 | 1 | 1 | 1 | 2 | 0 | 0 | 1810 | 760 |
| 4 | BENE11005 | 1935-09-01 | NaN | 1 | 1 | 0 | 24 | 680 | 12 | 12 | 2 | 2 | 2 | 2 | 1 | 2 | 1 | 2 | 2 | 2 | 2 | 0 | 0 | 1790 | 1200 |
train_bene_df['BeneID'].nunique()
138556
train_bene_df['Gender'].unique()
array([1, 2], dtype=int64)
train_bene_df['Gender'] = train_bene_df['Gender'].apply(lambda val: 0 if val == 2 else 1)
# Here, I'm displaying the distribution of BENEFICIARIES on the basis of GENDER
with plt.style.context('seaborn'):
plt.figure(figsize=(10,8))
fig = train_bene_df['Gender'].value_counts().plot(kind='bar', color=['yellow','purple'])
# Using the "patches" function we will get the location of the rectangle bars from the graph.
## Then by using those location(width & height) values we will add the annotations
for p in fig.patches:
width = p.get_width()
height = p.get_height()
x, y = p.get_xy()
fig.annotate(f'{str(round((height*100)/train_bene_df.shape[0],2))+"%"}', (x + width/2, y + height*1.015), ha='center', fontsize=13.5)
# Providing the labels and title to the graph
plt.xlabel("Gender Code", fontdict=label_font_dict)
plt.ylabel("Number or % share of patients\n", fontdict=label_font_dict)
plt.grid(which='major', linestyle="--", color='lightgrey')
plt.minorticks_on()
plt.title("Distribution of BENEFICIARIES based on GENDER\n", fontdict=title_font_dict)
OBERVATION
train_bene_df['DOB'] = pd.to_datetime(train_bene_df['DOB'], format="%Y-%m-%d")
train_bene_df['Patient_Age_Year'] = train_bene_df['DOB'].dt.year
train_bene_df['Patient_Age_Month'] = train_bene_df['DOB'].dt.month
Adding new feature :::::::::::::::::::::::: "YEAR of birth of beneficiaries"bene_age_year_df = pd.DataFrame(train_bene_df['Patient_Age_Year'].value_counts()).reset_index(drop=False)
bene_age_year_df.columns= ['year','num_of_beneficiaries']
bene_age_year_df = bene_age_year_df.sort_values(by='year')
# Here, I'm displaying the distribution of BENEFICIARIES on the basis of their YEAR of Birth
with plt.style.context('seaborn'):
plt.figure(figsize=(21,9))
fig = sns.barplot(data=bene_age_year_df, x='year', y='num_of_beneficiaries', palette='inferno')
# Using the "patches" function we will get the location of the rectangle bars from the graph.
## Then by using those location(width & height) values we will add the annotations
for p in fig.patches:
width = p.get_width()
height = p.get_height()
x, y = p.get_xy()
fig.annotate(f'{str(round((height*100)/train_bene_df.shape[0],1))+"%"}', (x + width/2, y + height*1.025), ha='center', fontsize=13.5, rotation=90)
# Providing the labels and title to the graph
plt.xlabel("\nBeneficary YEAR of Birth", fontdict=label_font_dict)
plt.xticks(rotation=90)
plt.ylabel("Number or % share of patients\n", fontdict=label_font_dict)
plt.minorticks_on()
plt.grid(which='major', linestyle="--", color='lightgrey')
plt.title("Distribution of BENEFICIARIES based on their YEAR of birth\n", fontdict=title_font_dict)
OBERVATION
REASONING
Adding new feature :::::::::::::::::::::::: "MONTH of birth of beneficiaries"bene_age_month_df = pd.DataFrame(train_bene_df['Patient_Age_Month'].value_counts()).reset_index(drop=False)
bene_age_month_df.columns= ['month','num_of_beneficiaries']
bene_age_month_df = bene_age_month_df.sort_values(by='month')
# Here, I'm displaying the distribution of BENEFICIARIES on the basis of their MONTH of Birth
with plt.style.context('seaborn'):
plt.figure(figsize=(12,8))
fig = sns.barplot(data=bene_age_month_df, x='month', y='num_of_beneficiaries', palette='summer')
# Using the "patches" function we will get the location of the rectangle bars from the graph.
## Then by using those location(width & height) values we will add the annotations
for p in fig.patches:
width = p.get_width()
height = p.get_height()
x, y = p.get_xy()
fig.annotate(f'{str(round((height*100)/train_bene_df.shape[0],1))+"%"}', (x + width/2, y + height*1.025), ha='center', fontsize=13.5, rotation=90)
# Providing the labels and title to the graph
plt.xlabel("\nBeneficary MONTH of Birth\n", fontdict=label_font_dict)
plt.xticks(rotation=90)
plt.ylabel("Number or % share of patients", fontdict=label_font_dict)
plt.minorticks_on()
plt.grid(which='major', linestyle="--", color='lightgrey')
plt.title("Distribution of BENEFICIARIES based on their MONTH of birth\n", fontdict=title_font_dict)
OBERVATION
REASONING
train_bene_df.head()
| BeneID | DOB | DOD | Gender | Race | RenalDiseaseIndicator | State | County | NoOfMonths_PartACov | NoOfMonths_PartBCov | ChronicCond_Alzheimer | ChronicCond_Heartfailure | ChronicCond_KidneyDisease | ChronicCond_Cancer | ChronicCond_ObstrPulmonary | ChronicCond_Depression | ChronicCond_Diabetes | ChronicCond_IschemicHeart | ChronicCond_Osteoporasis | ChronicCond_rheumatoidarthritis | ChronicCond_stroke | IPAnnualReimbursementAmt | IPAnnualDeductibleAmt | OPAnnualReimbursementAmt | OPAnnualDeductibleAmt | Patient_Age_Year | Patient_Age_Month | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | BENE11001 | 1943-01-01 | NaN | 1 | 1 | 0 | 39 | 230 | 12 | 12 | 1 | 2 | 1 | 2 | 2 | 1 | 1 | 1 | 2 | 1 | 1 | 36000 | 3204 | 60 | 70 | 1943 | 1 |
| 1 | BENE11002 | 1936-09-01 | NaN | 0 | 1 | 0 | 39 | 280 | 12 | 12 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 0 | 0 | 30 | 50 | 1936 | 9 |
| 2 | BENE11003 | 1936-08-01 | NaN | 1 | 1 | 0 | 52 | 590 | 12 | 12 | 1 | 2 | 2 | 2 | 2 | 2 | 2 | 1 | 2 | 2 | 2 | 0 | 0 | 90 | 40 | 1936 | 8 |
| 3 | BENE11004 | 1922-07-01 | NaN | 1 | 1 | 0 | 39 | 270 | 12 | 12 | 1 | 1 | 2 | 2 | 2 | 2 | 1 | 1 | 1 | 1 | 2 | 0 | 0 | 1810 | 760 | 1922 | 7 |
| 4 | BENE11005 | 1935-09-01 | NaN | 1 | 1 | 0 | 24 | 680 | 12 | 12 | 2 | 2 | 2 | 2 | 1 | 2 | 1 | 2 | 2 | 2 | 2 | 0 | 0 | 1790 | 1200 | 1935 | 9 |
Adding new indicator :::::::::::::::::::::::: "Beneficiary Dead or Alive?"REASONING# 0 means ALIVE and 1 means DEAD
train_bene_df['Dead_or_Alive'] = train_bene_df['DOD'].apply(lambda val: 0 if val != val else 1)
train_bene_df['Dead_or_Alive'].value_counts()
0 137135 1 1421 Name: Dead_or_Alive, dtype: int64
# Here, I'm displaying the distribution of whether BENEFICIARY is ALIVE or NOT?
with plt.style.context('seaborn'):
plt.figure(figsize=(10,8))
fig = train_bene_df['Dead_or_Alive'].value_counts().plot(kind='bar', color=['lightgreen','coral'])
# Using the "patches" function we will get the location of the rectangle bars from the graph.
## Then by using those location(width & height) values we will add the annotations
for p in fig.patches:
width = p.get_width()
height = p.get_height()
x, y = p.get_xy()
fig.annotate(f'{str(round((height*100)/train_bene_df.shape[0],2))+"%"}', (x + width/2, y + height*1.015), ha='center', fontsize=13.5)
# Providing the labels and title to the graph
plt.xlabel("Alive or Dead Status?", fontdict=label_font_dict)
plt.xticks(labels=["ALIVE","DEAD"], ticks=[0,1], rotation=20)
plt.ylabel("Number or % share of patients\n", fontdict=label_font_dict)
plt.grid(which='major', linestyle="--", color='lightgrey')
plt.minorticks_on()
plt.title("Distribution of BENEFICIARIES based on Alive or Dead Status\n", fontdict=title_font_dict)
OBSERVATIONtrain_bene_df['DOD'] = pd.to_datetime(train_bene_df['DOD'])
# Greatest Date of Death in the TRAIN set for beneficiaries
max_bene_DOD = max(train_bene_df['DOD'].unique()[1:])
max_bene_DOD
numpy.datetime64('2009-12-01T00:00:00.000000000')
# For all NAN DODs filling the greatest Date of Death
train_bene_df['DOD'].fillna(value=max_bene_DOD, inplace=True)
Adding new feature :::::::::::::::::::::::: "Beneficiary AGE"REASONINGtrain_bene_df['AGE'] = np.round(((train_bene_df['DOD'] - train_bene_df['DOB']).dt.days)/365.0,1)
train_bene_df.drop(labels=['DOD'],axis=1,inplace=True)
train_bene_df.head()
| BeneID | DOB | Gender | Race | RenalDiseaseIndicator | State | County | NoOfMonths_PartACov | NoOfMonths_PartBCov | ChronicCond_Alzheimer | ChronicCond_Heartfailure | ChronicCond_KidneyDisease | ChronicCond_Cancer | ChronicCond_ObstrPulmonary | ChronicCond_Depression | ChronicCond_Diabetes | ChronicCond_IschemicHeart | ChronicCond_Osteoporasis | ChronicCond_rheumatoidarthritis | ChronicCond_stroke | IPAnnualReimbursementAmt | IPAnnualDeductibleAmt | OPAnnualReimbursementAmt | OPAnnualDeductibleAmt | Patient_Age_Year | Patient_Age_Month | Dead_or_Alive | AGE | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | BENE11001 | 1943-01-01 | 1 | 1 | 0 | 39 | 230 | 12 | 12 | 1 | 2 | 1 | 2 | 2 | 1 | 1 | 1 | 2 | 1 | 1 | 36000 | 3204 | 60 | 70 | 1943 | 1 | 0 | 67.0 |
| 1 | BENE11002 | 1936-09-01 | 0 | 1 | 0 | 39 | 280 | 12 | 12 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 0 | 0 | 30 | 50 | 1936 | 9 | 0 | 73.3 |
| 2 | BENE11003 | 1936-08-01 | 1 | 1 | 0 | 52 | 590 | 12 | 12 | 1 | 2 | 2 | 2 | 2 | 2 | 2 | 1 | 2 | 2 | 2 | 0 | 0 | 90 | 40 | 1936 | 8 | 0 | 73.4 |
| 3 | BENE11004 | 1922-07-01 | 1 | 1 | 0 | 39 | 270 | 12 | 12 | 1 | 1 | 2 | 2 | 2 | 2 | 1 | 1 | 1 | 1 | 2 | 0 | 0 | 1810 | 760 | 1922 | 7 | 0 | 87.5 |
| 4 | BENE11005 | 1935-09-01 | 1 | 1 | 0 | 24 | 680 | 12 | 12 | 2 | 2 | 2 | 2 | 1 | 2 | 1 | 2 | 2 | 2 | 2 | 0 | 0 | 1790 | 1200 | 1935 | 9 | 0 | 74.3 |
# Here, I'm displaying the distribution of AGE of Beneficiaries?
with plt.style.context('seaborn'):
plt.figure(figsize=(10,8))
train_bene_df['AGE'].plot(kind='hist', color='purple')
# Providing the labels and title to the graph
plt.xlabel("\nBeneficiaries Age in years", fontdict=label_font_dict)
plt.ylabel("Frequency of patients\n", fontdict=label_font_dict)
plt.grid(which='major', linestyle="--", color='lightgrey')
plt.minorticks_on()
plt.title("Distribution of BENEFICIARIES AGE", fontdict=title_font_dict)
plt.legend();
OBSERVATIONtrain_bene_df['AGE'].describe()
count 138556.000000 mean 73.632756 std 12.736605 min 26.000000 25% 68.000000 50% 74.300000 75% 82.000000 max 101.000000 Name: AGE, dtype: float64
Adding new categorical column :::::::::::::::::::::::: "Beneficiary AGE brackets"REASONINGdef bene_age_brackets(val):
if val >=1 and val <=40:
return 'Young'
elif val > 40 and val <=60:
return 'Mid'
elif val > 60 and val <= 80:
return 'Old'
else:
return 'Very Old'
train_bene_df['AGE_groups'] = train_bene_df['AGE'].apply(lambda age: bene_age_brackets(age))
age_grps = list(train_bene_df['AGE_groups'].unique())
for grp in age_grps:
# Here, I'm displaying the distribution of AGE GROUPS of Beneficiaries?
with plt.style.context('seaborn'):
plt.figure(figsize=(8,6))
train_bene_df[train_bene_df['AGE_groups'] == grp]['AGE'].plot(kind='hist', color='grey')
# Providing the labels and title to the graph
plt.xlabel("\nBeneficiaries Age in years", fontdict=label_font_dict)
plt.ylabel("Frequency of patients\n", fontdict=label_font_dict)
plt.grid(which='major', linestyle="--", color='lightgrey')
plt.minorticks_on()
plt.title("Distribution of BENEFICIARIES Age group -- {}".format(str(grp).upper()), fontdict=title_font_dict)
plt.legend();
OBSERVATIONThis is based on a racial classification made by Carleton S. Coon in 1962. Refer here.
train_bene_df['Race'].unique()
array([1, 2, 3, 5], dtype=int64)
# Here, I'm displaying the distribution of BENEFICIARIES on the basis of Human RACE
with plt.style.context('seaborn'):
plt.figure(figsize=(10,8))
fig = train_bene_df['Race'].value_counts().plot(kind='bar', color=['lightgreen','coral','purple','red'])
# Using the "patches" function we will get the location of the rectangle bars from the graph.
## Then by using those location(width & height) values we will add the annotations
for p in fig.patches:
width = p.get_width()
height = p.get_height()
x, y = p.get_xy()
fig.annotate(f'{str(round((height*100)/train_bene_df.shape[0],2))+"%"}', (x + width/2, y + height*1.02), ha='center', fontsize=13.5)
# Providing the labels and title to the graph
plt.xlabel("Human RACE", fontdict=label_font_dict)
plt.xticks(labels=["Race_1","Race_2","Race_3","Race_5"], ticks=[0,1,2,3], rotation=10)
plt.ylabel("Number or % share of patients\n", fontdict=label_font_dict)
plt.grid(which='major', linestyle="--", color='lightgrey')
plt.minorticks_on()
plt.title("Distribution of BENEFICIARIES on the basis of Human RACE\n", fontdict=title_font_dict)
OBSERVATION# Lets validate whether we have imbalance of males and females across the human races
with plt.style.context('seaborn'):
plt.figure(figsize=(12,8))
fig = train_bene_df.groupby(['Gender','Race'])['AGE'].count().plot(kind='bar', color=['lightgreen','coral','purple','red','lightgreen','coral','purple','red'])
# Using the "patches" function we will get the location of the rectangle bars from the graph.
## Then by using those location(width & height) values we will add the annotations
for p in fig.patches:
width = p.get_width()
height = p.get_height()
x, y = p.get_xy()
fig.annotate(f'{str(round((height*100)/train_bene_df.shape[0],2))+"%"}', (x + width/2, y + height*1.02), ha='center', fontsize=13.5)
# Providing the labels and title to the graph
plt.xlabel("Gender & Human RACE", fontdict=label_font_dict)
plt.xticks(labels=[("Gender_0","Race_1"), ("Gender_0","Race_2"), ("Gender_0","Race_3"), ("Gender_0","Race_5"),("Gender_1","Race_1"), ("Gender_1","Race_2"), ("Gender_1","Race_3"), ("Gender_1","Race_5")],
ticks=[0,1,2,3,4,5,6,7], rotation=80)
plt.ylabel("Number or % share of patients\n", fontdict=label_font_dict)
plt.grid(which='major', linestyle="--", color='lightgrey')
plt.minorticks_on()
plt.title("Imbalance of males and females across the human races?\n", fontdict=title_font_dict)
OBSERVATIONThere are two main types of kidney disease - short-term (acute kidney injury) and lifelong (chronic).
The two main types of kidney disease are short-term (acute kidney injury) and lifelong (chronic kidney disease).
What are the main causes of chronic kidney disease?
# Here, I'm displaying the distribution of BENEFICIARIES on the basis of State
with plt.style.context('seaborn'):
plt.figure(figsize=(10,8))
fig = train_bene_df['RenalDiseaseIndicator'].value_counts().plot(kind='bar', color=['green','orange'])
# Using the "patches" function we will get the location of the rectangle bars from the graph.
## Then by using those location(width & height) values we will add the annotations
for p in fig.patches:
width = p.get_width()
height = p.get_height()
x, y = p.get_xy()
fig.annotate(f'{str(round((height*100)/train_bene_df.shape[0],2))+"%"}', (x + width/2, y + height*1.02), ha='center', fontsize=13.5, rotation=0)
# Providing the labels and title to the graph
plt.xlabel("\nRenal Disease present or not?", fontdict=label_font_dict)
plt.xticks(labels=["NO","YES"], ticks=[0,1], rotation=10, size=12)
plt.ylabel("Number or % share of patients\n", fontdict=label_font_dict)
plt.grid(which='major', linestyle="--", color='lightgrey')
plt.minorticks_on()
plt.title("Distribution of BENEFICIARIES on the basis of Renal Disease\n", fontdict=title_font_dict)
# 0 means NO and 1 means YES
print(pd.DataFrame(train_bene_df['RenalDiseaseIndicator'].value_counts()),"\n")
RenalDiseaseIndicator 0 118978 Y 19578
OBSERVATION# Here, I'm displaying the distribution of BENEFICIARIES on the basis of "ChronicCond_KidneyDisease"
with plt.style.context('seaborn'):
plt.figure(figsize=(10,8))
fig = train_bene_df['ChronicCond_KidneyDisease'].value_counts().plot(kind='bar', color=['green','orange'])
# Using the "patches" function we will get the location of the rectangle bars from the graph.
## Then by using those location(width & height) values we will add the annotations
for p in fig.patches:
width = p.get_width()
height = p.get_height()
x, y = p.get_xy()
fig.annotate(f'{str(round((height*100)/train_bene_df.shape[0],2))+"%"}', (x + width/2, y + height*1.02), ha='center', fontsize=13.5, rotation=0)
# Providing the labels and title to the graph
plt.xlabel("\n Chronic Kidney Disease present or not?", fontdict=label_font_dict)
plt.xticks(labels=["NO","YES"], ticks=[0,1], rotation=10, size=12)
plt.ylabel("Number or % share of patients\n", fontdict=label_font_dict)
plt.grid(which='major', linestyle="--", color='lightgrey')
plt.minorticks_on()
plt.title("Distribution of BENEFICIARIES on the basis of 'ChronicCond_KidneyDisease'\n", fontdict=title_font_dict)
# I believe 2 means NO and 1 means YES
print(pd.DataFrame(train_bene_df['ChronicCond_KidneyDisease'].value_counts()),'\n')
ChronicCond_KidneyDisease 2 95277 1 43279
OBSERVATIONI found this link useful in order to understand the difference b/w both of these. It looks they have RenalDisease indicator to represent whether the beneficiary has or had Kidney Failure. And, ChronicCond_KidneyDisease represents the long term Kidney Disease may be like not functioning to the fullest.
# Lets validate whether we have beneficiaries with both RKD & CKD?
with plt.style.context('seaborn'):
plt.figure(figsize=(12,8))
fig = train_bene_df.groupby(['RenalDiseaseIndicator','ChronicCond_KidneyDisease'])['Gender'].count().plot(kind='bar', color=['orange','green','purple','red'])
# Using the "patches" function we will get the location of the rectangle bars from the graph.
## Then by using those location(width & height) values we will add the annotations
for p in fig.patches:
width = p.get_width()
height = p.get_height()
x, y = p.get_xy()
fig.annotate(f'{str(round((height*100)/train_bene_df.shape[0],2))+"%"}', (x + width/2, y + height*1.02), ha='center', fontsize=13.5)
# Added the Description for one of the horizontal bar
fig.annotate('This is interesting!!\n Beneficiaries with NO Long Term Kidney Malfunction but\n suffered from Kidney Failure.',
xy=(2.85, 0.85), xycoords='data', xytext=(1.02, 0.65), textcoords='axes fraction', fontsize=12.5,
arrowprops=dict(facecolor='black', shrink=0.03), horizontalalignment='right', verticalalignment='top')
# Providing the labels and title to the graph
plt.xlabel("\nRKD and CKD both are present?", fontdict=label_font_dict)
plt.xticks(labels=[("RKD_No","CKD_Yes"), ("RKD_No","CKD_No"), ("RKD_Yes","CKD_Yes"), ("RKD_Yes","CKD_No")], ticks=[0,1,2,3], rotation=0, size=12)
plt.ylabel("Number or % share of patients\n", fontdict=label_font_dict)
plt.grid(which='major', linestyle="--", color='lightgrey')
plt.minorticks_on()
plt.title("Imbalance b/w males and females across the human races?\n", fontdict=title_font_dict)
# RKD --> Renal Disease Indicator
# CKD --> Chronic Condition Kidney Disease
print(pd.DataFrame(train_bene_df.groupby(['RenalDiseaseIndicator','ChronicCond_KidneyDisease'])['Gender'].count()),"\n")
Gender
RenalDiseaseIndicator ChronicCond_KidneyDisease
0 1 27768
2 91210
Y 1 15511
2 4067
OBSERVATIONtrain_bene_df['State'].unique()
array([39, 52, 24, 23, 45, 15, 44, 41, 1, 14, 7, 13, 34, 31, 5, 49, 46,
6, 38, 10, 26, 3, 20, 33, 18, 27, 51, 29, 42, 36, 11, 12, 30, 19,
4, 25, 17, 16, 32, 2, 21, 50, 22, 54, 28, 43, 35, 37, 47, 8, 9,
53], dtype=int64)
# Here, I'm displaying the distribution of BENEFICIARIES on the basis of State
with plt.style.context('seaborn'):
plt.figure(figsize=(20,9))
fig = train_bene_df['State'].value_counts().plot(kind='bar')
# Using the "patches" function we will get the location of the rectangle bars from the graph.
## Then by using those location(width & height) values we will add the annotations
for p in fig.patches:
width = p.get_width()
height = p.get_height()
x, y = p.get_xy()
fig.annotate(f'{str(round((height*100)/train_bene_df.shape[0],2))+"%"}', (x + width/2, y + height*1.03), ha='center', fontsize=13.5, rotation=90)
# Providing the labels and title to the graph
plt.xlabel("\nState Codes", fontdict=label_font_dict)
plt.ylabel("Number or % share of patients\n", fontdict=label_font_dict)
plt.grid(which='major', linestyle="--", color='lightgrey')
plt.minorticks_on()
plt.title("Distribution of BENEFICIARIES on the basis of States\n", fontdict=title_font_dict)
OBSERVATION# Here, I'm displaying the distribution of BENEFICIARIES on the basis of Country
with plt.style.context('seaborn-poster'):
plt.figure(figsize=(18,8))
fig = train_bene_df['County'].value_counts()[0:45].plot(kind='bar', color='palegreen')
# Using the "patches" function we will get the location of the rectangle bars from the graph.
## Then by using those location(width & height) values we will add the annotations
for p in fig.patches:
width = p.get_width()
height = p.get_height()
x, y = p.get_xy()
fig.annotate(f'{str(round((height*100)/train_bene_df.shape[0],2))+"%"}', (x + width/2, y + height*1.03), ha='center', fontsize=13.5, rotation=90)
# Providing the labels and title to the graph
plt.xlabel("\nCountry Codes", fontdict=label_font_dict)
plt.ylabel("Number or % share of patients\n", fontdict=label_font_dict)
plt.grid(which='major', linestyle="--", color='lightgrey')
plt.minorticks_on()
plt.title("Distribution of BENEFICIARIES on the basis of Countries (Top-45)", fontdict=title_font_dict)
# Countries with only handful of beneficiaries
train_bene_df['County'].value_counts()[::-1][0:45]
834 1 944 1 804 1 796 1 742 1 744 1 753 1 117 1 521 1 691 2 84 2 875 2 912 2 14 2 904 2 712 2 793 3 785 3 873 3 412 3 955 3 632 3 672 3 161 3 862 3 876 3 844 3 1 3 662 4 612 4 893 4 903 4 931 4 887 4 583 4 34 4 755 4 292 5 431 5 592 5 741 5 361 5 562 5 542 5 797 5 Name: County, dtype: int64
OBSERVATION
NOTE
train_bene_df['NoOfMonths_PartACov'].unique()
array([12, 0, 3, 7, 10, 11, 8, 4, 1, 6, 2, 9, 5], dtype=int64)
# Here, I'm displaying the distribution of BENEFICIARIES on the basis of NoOfMonths_PartACov
with plt.style.context('seaborn-poster'):
plt.figure(figsize=(18,8))
fig = train_bene_df['NoOfMonths_PartACov'].value_counts().plot(kind='bar', color='palegreen')
# Using the "patches" function we will get the location of the rectangle bars from the graph.
## Then by using those location(width & height) values we will add the annotations
for p in fig.patches:
width = p.get_width()
height = p.get_height()
x, y = p.get_xy()
fig.annotate(f'{str(round((height*100)/train_bene_df.shape[0],2))+"%"}', (x + width/2, y + height*1.03), ha='center', fontsize=13.5, rotation=90)
# Providing the labels and title to the graph
plt.xlabel("\nMonths for Part-A Coverage", fontdict=label_font_dict)
plt.ylabel("Number or % share of patients\n", fontdict=label_font_dict)
plt.grid(which='major', linestyle="--", color='lightgrey')
plt.minorticks_on()
plt.title("Distribution of BENEFICIARIES on the basis of 'NoOfMonths_PartACov'", fontdict=title_font_dict)
OBSERVATIONtrain_bene_df['NoOfMonths_PartBCov'].unique()
array([12, 0, 1, 9, 11, 3, 10, 8, 2, 6, 4, 5, 7], dtype=int64)
# Here, I'm displaying the distribution of BENEFICIARIES on the basis of NoOfMonths_PartBCov
with plt.style.context('seaborn-poster'):
plt.figure(figsize=(18,8))
fig = train_bene_df['NoOfMonths_PartBCov'].value_counts().plot(kind='bar', color='palegreen')
# Using the "patches" function we will get the location of the rectangle bars from the graph.
## Then by using those location(width & height) values we will add the annotations
for p in fig.patches:
width = p.get_width()
height = p.get_height()
x, y = p.get_xy()
fig.annotate(f'{str(round((height*100)/train_bene_df.shape[0],2))+"%"}', (x + width/2, y + height*1.03), ha='center', fontsize=13.5, rotation=90)
# Providing the labels and title to the graph
plt.xlabel("\nMonths for Part-B Coverage", fontdict=label_font_dict)
plt.ylabel("Number or % share of patients\n", fontdict=label_font_dict)
plt.grid(which='major', linestyle="--", color='lightgrey')
plt.minorticks_on()
plt.title("Distribution of BENEFICIARIES on the basis of 'NoOfMonths_PartBCov'", fontdict=title_font_dict)
OBSERVATION# Here, I'm displaying the distribution of BENEFICIARIES on the basis of 'ChronicCond_Alzheimer'
with plt.style.context('seaborn-poster'):
plt.figure(figsize=(12,8))
fig = train_bene_df['ChronicCond_Alzheimer'].value_counts().plot(kind='bar', color=['palegreen','orange'])
# Using the "patches" function we will get the location of the rectangle bars from the graph.
## Then by using those location(width & height) values we will add the annotations
for p in fig.patches:
width = p.get_width()
height = p.get_height()
x, y = p.get_xy()
fig.annotate(f'{str(round((height*100)/train_bene_df.shape[0],2))+"%"}', (x + width/2, y + height*1.01), ha='center', fontsize=13.5, rotation=0)
# Providing the labels and title to the graph
plt.xlabel("\nHaving Chronic ALZH Disease?", fontdict=label_font_dict)
plt.xticks(ticks=[0,1], labels=['NO', 'YES'], fontsize=13, rotation=30)
plt.ylabel("Number or % share of patients\n", fontdict=label_font_dict)
plt.grid(which='major', linestyle="--", color='lightgrey')
plt.minorticks_on()
plt.title("Distribution of BENEFICIARIES on the basis of 'ChronicCond_Alzheimer'\n", fontdict=title_font_dict)
# 1 means +ve with Chronic ALZH Disease
# 2 means -ve with Chronic ALZH Disease
print(pd.DataFrame(train_bene_df['ChronicCond_Alzheimer'].value_counts()),"\n")
ChronicCond_Alzheimer 2 92530 1 46026
OBSERVATION# Here, I'm displaying the Total Annual Sum of Max IP Reimbursement for 'ChronicCond_Alzheimer'
with plt.style.context('seaborn-poster'):
plt.figure(figsize=(12,8))
fig = train_bene_df.groupby(['ChronicCond_Alzheimer'])['IPAnnualReimbursementAmt'].sum().plot(kind='bar', color=['orange','palegreen'])
# Using the "patches" function we will get the location of the rectangle bars from the graph.
## Then by using those location(width & height) values we will add the annotations
for p in fig.patches:
width = p.get_width()
height = p.get_height()
x, y = p.get_xy()
fig.annotate(f'{str(round((height*100)/(train_bene_df["IPAnnualReimbursementAmt"].sum()),2))+"%"}', (x + width/2, y + height*1.01), ha='center', fontsize=13.5, rotation=0)
# Providing the labels and title to the graph
plt.xlabel("\nHaving Chronic ALZH Disease?", fontdict=label_font_dict)
plt.xticks(ticks=[0,1], labels=['YES', 'NO'], fontsize=13, rotation=30)
plt.ylabel("Total Annual Sum of Max IP Reimbursement \n", fontdict=label_font_dict)
plt.grid(which='major', linestyle="--", color='lightgrey')
plt.minorticks_on()
plt.title("Total Annual Sum of Max IP Reimbursement : 'ChronicCond_Alzheimer'\n", fontdict=title_font_dict)
# 1 means +ve with Chronic ALZH Disease
# 2 means -ve with Chronic ALZH Disease
print(pd.DataFrame(train_bene_df.groupby(['ChronicCond_Alzheimer'])['IPAnnualReimbursementAmt'].sum()),"\n")
IPAnnualReimbursementAmt ChronicCond_Alzheimer 1 247213280 2 259949690
OBSERVATION# Here, I'm displaying the Total Annual Sum of Max OP Reimbursement for 'ChronicCond_Alzheimer'
with plt.style.context('seaborn-poster'):
plt.figure(figsize=(12,8))
fig = train_bene_df.groupby(['ChronicCond_Alzheimer'])['OPAnnualReimbursementAmt'].sum().plot(kind='bar', color=['orange','palegreen'])
# Using the "patches" function we will get the location of the rectangle bars from the graph.
## Then by using those location(width & height) values we will add the annotations
for p in fig.patches:
width = p.get_width()
height = p.get_height()
x, y = p.get_xy()
fig.annotate(f'{str(round((height*100)/(train_bene_df["OPAnnualReimbursementAmt"].sum()),2))+"%"}', (x + width/2, y + height*1.01), ha='center', fontsize=13.5, rotation=0)
# Providing the labels and title to the graph
plt.xlabel("\nHaving Chronic ALZH Disease?", fontdict=label_font_dict)
plt.xticks(ticks=[0,1], labels=['YES', 'NO'], fontsize=13, rotation=30)
plt.ylabel("Total Annual Sum of Max OP Reimbursement \n", fontdict=label_font_dict)
plt.grid(which='major', linestyle="--", color='lightgrey')
plt.minorticks_on()
plt.title("Total Annual Sum of Max OP Reimbursement : 'ChronicCond_Alzheimer'\n", fontdict=title_font_dict)
# 1 means +ve with Chronic ALZH Disease
# 2 means -ve with Chronic ALZH Disease
print(pd.DataFrame(train_bene_df.groupby(['ChronicCond_Alzheimer'])['OPAnnualReimbursementAmt'].sum()),"\n")
OPAnnualReimbursementAmt ChronicCond_Alzheimer 1 74721530 2 105154550
OBSERVATION# Here, I'm displaying the Total Annual Sum of IP Co-payment for 'ChronicCond_Alzheimer'
with plt.style.context('seaborn-poster'):
plt.figure(figsize=(12,8))
fig = train_bene_df.groupby(['ChronicCond_Alzheimer'])['IPAnnualDeductibleAmt'].sum().plot(kind='bar', color=['orange','palegreen'])
# Using the "patches" function we will get the location of the rectangle bars from the graph.
## Then by using those location(width & height) values we will add the annotations
for p in fig.patches:
width = p.get_width()
height = p.get_height()
x, y = p.get_xy()
fig.annotate(f'{str(round((height*100)/(train_bene_df["IPAnnualDeductibleAmt"].sum()),2))+"%"}', (x + width/2, y + height*1.01), ha='center', fontsize=13.5, rotation=0)
# Providing the labels and title to the graph
plt.xlabel("\nHaving Chronic ALZH Disease?", fontdict=label_font_dict)
plt.xticks(ticks=[0,1], labels=['YES', 'NO'], fontsize=13, rotation=30)
plt.ylabel("Total Annual Sum of IP Co-payment \n", fontdict=label_font_dict)
plt.grid(which='major', linestyle="--", color='lightgrey')
plt.minorticks_on()
plt.title("Total Annual Sum of IP Co-payment paid by patient : 'ChronicCond_Alzheimer'\n", fontdict=title_font_dict)
# 1 means +ve with Chronic ALZH Disease
# 2 means -ve with Chronic ALZH Disease
print(pd.DataFrame(train_bene_df.groupby(['ChronicCond_Alzheimer'])['IPAnnualDeductibleAmt'].sum()),"\n")
IPAnnualDeductibleAmt ChronicCond_Alzheimer 1 28200854 2 27200388
OBSERVATION# Here, I'm displaying the Total Annual Sum of OP Co-payment for 'ChronicCond_Alzheimer'
with plt.style.context('seaborn-poster'):
plt.figure(figsize=(12,8))
fig = train_bene_df.groupby(['ChronicCond_Alzheimer'])['OPAnnualDeductibleAmt'].sum().plot(kind='bar', color=['orange','palegreen'])
# Using the "patches" function we will get the location of the rectangle bars from the graph.
## Then by using those location(width & height) values we will add the annotations
for p in fig.patches:
width = p.get_width()
height = p.get_height()
x, y = p.get_xy()
fig.annotate(f'{str(round((height*100)/(train_bene_df["OPAnnualDeductibleAmt"].sum()),2))+"%"}', (x + width/2, y + height*1.01), ha='center', fontsize=13.5, rotation=0)
# Providing the labels and title to the graph
plt.xlabel("\nHaving Chronic ALZH Disease?", fontdict=label_font_dict)
plt.xticks(ticks=[0,1], labels=['YES', 'NO'], fontsize=13, rotation=30)
plt.ylabel("Total Annual Sum of OP Co-payment \n", fontdict=label_font_dict)
plt.grid(which='major', linestyle="--", color='lightgrey')
plt.minorticks_on()
plt.title("Total Annual Sum of OP Co-payment paid by patient : 'ChronicCond_Alzheimer'\n", fontdict=title_font_dict)
# 1 means +ve with Chronic ALZH Disease
# 2 means -ve with Chronic ALZH Disease
print(pd.DataFrame(train_bene_df.groupby(['ChronicCond_Alzheimer'])['OPAnnualDeductibleAmt'].sum()),"\n")
OPAnnualDeductibleAmt ChronicCond_Alzheimer 1 21502981 2 30832150
OBSERVATION# Number of beneficiaries with chronic or no-chronic conditions
pd.DataFrame(train_bene_df.groupby(['ChronicCond_Alzheimer'])['BeneID'].count())
| BeneID | |
|---|---|
| ChronicCond_Alzheimer | |
| 1 | 46026 |
| 2 | 92530 |
CC_ALZH_IP_R = pd.DataFrame(train_bene_df.groupby(['ChronicCond_Alzheimer'])['IPAnnualReimbursementAmt'].sum() / train_bene_df.groupby(['ChronicCond_Alzheimer'])['BeneID'].count())
CC_ALZH_IP_R.columns = ['AVG IP Reimbursement Amt']
CC_ALZH_IP_R
| AVG IP Reimbursement Amt | |
|---|---|
| ChronicCond_Alzheimer | |
| 1 | 5371.165863 |
| 2 | 2809.355777 |
CC_ALZH_OP_R = pd.DataFrame(train_bene_df.groupby(['ChronicCond_Alzheimer'])['OPAnnualReimbursementAmt'].sum() / train_bene_df.groupby(['ChronicCond_Alzheimer'])['BeneID'].count())
CC_ALZH_OP_R.columns = ['AVG OP Reimbursement Amt']
CC_ALZH_OP_R
| AVG OP Reimbursement Amt | |
|---|---|
| ChronicCond_Alzheimer | |
| 1 | 1623.463477 |
| 2 | 1136.437372 |
CC_ALZH_IP_D = pd.DataFrame(train_bene_df.groupby(['ChronicCond_Alzheimer'])['IPAnnualDeductibleAmt'].sum() / train_bene_df.groupby(['ChronicCond_Alzheimer'])['BeneID'].count())
CC_ALZH_IP_D.columns = ['AVG IP Co-payment Amt']
CC_ALZH_IP_D
| AVG IP Co-payment Amt | |
|---|---|
| ChronicCond_Alzheimer | |
| 1 | 612.715726 |
| 2 | 293.962909 |
CC_ALZH_OP_D = pd.DataFrame(train_bene_df.groupby(['ChronicCond_Alzheimer'])['OPAnnualDeductibleAmt'].sum() / train_bene_df.groupby(['ChronicCond_Alzheimer'])['BeneID'].count())
CC_ALZH_OP_D.columns = ['AVG OP Co-payment Amt']
CC_ALZH_OP_D
| AVG OP Co-payment Amt | |
|---|---|
| ChronicCond_Alzheimer | |
| 1 | 467.192044 |
| 2 | 333.212472 |
CC_ALZH_all_amts = pd.concat([CC_ALZH_IP_R, CC_ALZH_OP_R, CC_ALZH_IP_D, CC_ALZH_OP_D], axis=1)
CC_ALZH_all_amts
| AVG IP Reimbursement Amt | AVG OP Reimbursement Amt | AVG IP Co-payment Amt | AVG OP Co-payment Amt | |
|---|---|---|---|---|
| ChronicCond_Alzheimer | ||||
| 1 | 5371.165863 | 1623.463477 | 612.715726 | 467.192044 |
| 2 | 2809.355777 | 1136.437372 | 293.962909 | 333.212472 |
# Here, I'm displaying the Total Annual Sum of IP Co-payment for 'ChronicCond_Alzheimer'
with plt.style.context('seaborn-poster'):
fig = CC_ALZH_all_amts.plot(kind='bar', colormap='rainbow')
# Using the "patches" function we will get the location of the rectangle bars from the graph.
## Then by using those location(width & height) values we will add the annotations
for p in fig.patches:
width = p.get_width()
height = p.get_height()
x, y = p.get_xy()
fig.annotate(f'{round(height,0)}', (x + width/2, y + height*1.015), ha='center', fontsize=12, rotation=0)
# Providing the labels and title to the graph
plt.xlabel("\nHaving Chronic ALZH Disease?", fontdict=label_font_dict)
plt.xticks(ticks=[0,1], labels=['YES', 'NO'], fontsize=13, rotation=30)
plt.ylabel("Total Annual Sum \n", fontdict=label_font_dict)
plt.grid(which='major', linestyle="-.", color='lightgrey')
plt.minorticks_on()
plt.title("Total Annual Sum of various amounts : 'ChronicCond_Alzheimer'\n", fontdict=title_font_dict)
# 1 means +ve with Chronic ALZH Disease
# 2 means -ve with Chronic ALZH Disease
print(CC_ALZH_all_amts,"\n")
AVG IP Reimbursement Amt AVG OP Reimbursement Amt \
ChronicCond_Alzheimer
1 5371.165863 1623.463477
2 2809.355777 1136.437372
AVG IP Co-payment Amt AVG OP Co-payment Amt
ChronicCond_Alzheimer
1 612.715726 467.192044
2 293.962909 333.212472
OBSERVATION# Here, I'm displaying the distribution of BENEFICIARIES on the basis of 'ChronicCond_Heartfailure'
with plt.style.context('seaborn-poster'):
plt.figure(figsize=(12,8))
fig = train_bene_df['ChronicCond_Heartfailure'].value_counts().plot(kind='bar', color=['palegreen','orange'])
# Using the "patches" function we will get the location of the rectangle bars from the graph.
## Then by using those location(width & height) values we will add the annotations
for p in fig.patches:
width = p.get_width()
height = p.get_height()
x, y = p.get_xy()
fig.annotate(f'{str(round((height*100)/train_bene_df.shape[0],2))+"%"}', (x + width/2, y + height*1.01), ha='center', fontsize=13.5, rotation=0)
# Providing the labels and title to the graph
plt.xlabel("\nHaving Chronic HF Disease?", fontdict=label_font_dict)
plt.xticks(ticks=[0,1], labels=['NO', 'YES'], fontsize=13, rotation=30)
plt.ylabel("Number or % share of patients\n", fontdict=label_font_dict)
plt.grid(which='major', linestyle="--", color='lightgrey')
plt.minorticks_on()
plt.title("Distribution of BENEFICIARIES on the basis of 'ChronicCond_Heartfailure'\n", fontdict=title_font_dict)
# 1 means +ve with Chronic HF Disease
# 2 means -ve with Chronic HF Disease
print(pd.DataFrame(train_bene_df['ChronicCond_Heartfailure'].value_counts()),"\n")
ChronicCond_Heartfailure 2 70154 1 68402
OBSERVATION# Here, I'm displaying the Total Annual Sum of Max IP Reimbursement for 'ChronicCond_Heartfailure'
with plt.style.context('seaborn-poster'):
plt.figure(figsize=(12,8))
fig = train_bene_df.groupby(['ChronicCond_Heartfailure'])['IPAnnualReimbursementAmt'].sum().plot(kind='bar', color=['orange','palegreen'])
# Using the "patches" function we will get the location of the rectangle bars from the graph.
## Then by using those location(width & height) values we will add the annotations
for p in fig.patches:
width = p.get_width()
height = p.get_height()
x, y = p.get_xy()
fig.annotate(f'{str(round((height*100)/(train_bene_df["IPAnnualReimbursementAmt"].sum()),2))+"%"}', (x + width/2, y + height*1.01), ha='center', fontsize=13.5, rotation=0)
# Providing the labels and title to the graph
plt.xlabel("\nHaving Chronic HF Disease?", fontdict=label_font_dict)
plt.xticks(ticks=[0,1], labels=['YES', 'NO'], fontsize=13, rotation=30)
plt.ylabel("Total Annual Sum of Max IP Reimbursement \n", fontdict=label_font_dict)
plt.grid(which='major', linestyle="--", color='lightgrey')
plt.minorticks_on()
plt.title("Total Annual Sum of Max IP Reimbursement : 'ChronicCond_Heartfailure'\n", fontdict=title_font_dict)
# 1 means +ve with Chronic HF Disease
# 2 means -ve with Chronic HF Disease
print(pd.DataFrame(train_bene_df.groupby(['ChronicCond_Heartfailure'])['IPAnnualReimbursementAmt'].sum()),"\n")
IPAnnualReimbursementAmt ChronicCond_Heartfailure 1 370856140 2 136306830
OBSERVATION# Here, I'm displaying the Total Annual Sum of Max OP Reimbursement for 'ChronicCond_Heartfailure'
with plt.style.context('seaborn-poster'):
plt.figure(figsize=(12,8))
fig = train_bene_df.groupby(['ChronicCond_Heartfailure'])['OPAnnualReimbursementAmt'].sum().plot(kind='bar', color=['orange','palegreen'])
# Using the "patches" function we will get the location of the rectangle bars from the graph.
## Then by using those location(width & height) values we will add the annotations
for p in fig.patches:
width = p.get_width()
height = p.get_height()
x, y = p.get_xy()
fig.annotate(f'{str(round((height*100)/(train_bene_df["OPAnnualReimbursementAmt"].sum()),2))+"%"}', (x + width/2, y + height*1.01), ha='center', fontsize=13.5, rotation=0)
# Providing the labels and title to the graph
plt.xlabel("\nHaving Chronic HF Disease?", fontdict=label_font_dict)
plt.xticks(ticks=[0,1], labels=['YES', 'NO'], fontsize=13, rotation=30)
plt.ylabel("Total Annual Sum of Max OP Reimbursement \n", fontdict=label_font_dict)
plt.grid(which='major', linestyle="--", color='lightgrey')
plt.minorticks_on()
plt.title("Total Annual Sum of Max OP Reimbursement : 'ChronicCond_Heartfailure'\n", fontdict=title_font_dict)
# 1 means +ve with Chronic HF Disease
# 2 means -ve with Chronic HF Disease
print(pd.DataFrame(train_bene_df.groupby(['ChronicCond_Heartfailure'])['OPAnnualReimbursementAmt'].sum()),"\n")
OPAnnualReimbursementAmt ChronicCond_Heartfailure 1 114280500 2 65595580
OBSERVATION# Here, I'm displaying the Total Annual Sum of IP Co-payment for 'ChronicCond_Heartfailure'
with plt.style.context('seaborn-poster'):
plt.figure(figsize=(12,8))
fig = train_bene_df.groupby(['ChronicCond_Heartfailure'])['IPAnnualDeductibleAmt'].sum().plot(kind='bar', color=['orange','palegreen'])
# Using the "patches" function we will get the location of the rectangle bars from the graph.
## Then by using those location(width & height) values we will add the annotations
for p in fig.patches:
width = p.get_width()
height = p.get_height()
x, y = p.get_xy()
fig.annotate(f'{str(round((height*100)/(train_bene_df["IPAnnualDeductibleAmt"].sum()),2))+"%"}', (x + width/2, y + height*1.01), ha='center', fontsize=13.5, rotation=0)
# Providing the labels and title to the graph
plt.xlabel("\nHaving Chronic HF Disease?", fontdict=label_font_dict)
plt.xticks(ticks=[0,1], labels=['YES', 'NO'], fontsize=13, rotation=30)
plt.ylabel("Total Annual Sum of IP Co-payment \n", fontdict=label_font_dict)
plt.grid(which='major', linestyle="--", color='lightgrey')
plt.minorticks_on()
plt.title("Total Annual Sum of IP Co-payment paid by patient : 'ChronicCond_Heartfailure'\n", fontdict=title_font_dict)
# 1 means +ve with Chronic HF Disease
# 2 means -ve with Chronic HF Disease
print(pd.DataFrame(train_bene_df.groupby(['ChronicCond_Heartfailure'])['IPAnnualDeductibleAmt'].sum()),"\n")
IPAnnualDeductibleAmt ChronicCond_Heartfailure 1 40303650 2 15097592
OBSERVATION# Here, I'm displaying the Total Annual Sum of OP Co-payment for 'ChronicCond_Heartfailure'
with plt.style.context('seaborn-poster'):
plt.figure(figsize=(12,8))
fig = train_bene_df.groupby(['ChronicCond_Heartfailure'])['OPAnnualDeductibleAmt'].sum().plot(kind='bar', color=['orange','palegreen'])
# Using the "patches" function we will get the location of the rectangle bars from the graph.
## Then by using those location(width & height) values we will add the annotations
for p in fig.patches:
width = p.get_width()
height = p.get_height()
x, y = p.get_xy()
fig.annotate(f'{str(round((height*100)/(train_bene_df["OPAnnualDeductibleAmt"].sum()),2))+"%"}', (x + width/2, y + height*1.01), ha='center', fontsize=13.5, rotation=0)
# Providing the labels and title to the graph
plt.xlabel("\nHaving Chronic HF Disease?", fontdict=label_font_dict)
plt.xticks(ticks=[0,1], labels=['YES', 'NO'], fontsize=13, rotation=30)
plt.ylabel("Total Annual Sum of OP Co-payment \n", fontdict=label_font_dict)
plt.grid(which='major', linestyle="--", color='lightgrey')
plt.minorticks_on()
plt.title("Total Annual Sum of OP Co-payment paid by patient : 'ChronicCond_Heartfailure'\n", fontdict=title_font_dict)
# 1 means +ve with Chronic HF Disease
# 2 means -ve with Chronic HF Disease
print(pd.DataFrame(train_bene_df.groupby(['ChronicCond_Heartfailure'])['OPAnnualDeductibleAmt'].sum()),"\n")
OPAnnualDeductibleAmt ChronicCond_Heartfailure 1 32649838 2 19685293
OBSERVATION# Number of beneficiaries with chronic or no-chronic conditions
pd.DataFrame(train_bene_df.groupby(['ChronicCond_Heartfailure'])['BeneID'].count())
| BeneID | |
|---|---|
| ChronicCond_Heartfailure | |
| 1 | 68402 |
| 2 | 70154 |
CC_HF_IP_R = pd.DataFrame(train_bene_df.groupby(['ChronicCond_Heartfailure'])['IPAnnualReimbursementAmt'].sum() / train_bene_df.groupby(['ChronicCond_Heartfailure'])['BeneID'].count())
CC_HF_IP_R.columns = ['AVG IP Reimbursement Amt']
CC_HF_IP_R
| AVG IP Reimbursement Amt | |
|---|---|
| ChronicCond_Heartfailure | |
| 1 | 5421.714862 |
| 2 | 1942.965904 |
CC_HF_OP_R = pd.DataFrame(train_bene_df.groupby(['ChronicCond_Heartfailure'])['OPAnnualReimbursementAmt'].sum() / train_bene_df.groupby(['ChronicCond_Heartfailure'])['BeneID'].count())
CC_HF_OP_R.columns = ['AVG OP Reimbursement Amt']
CC_HF_OP_R
| AVG OP Reimbursement Amt | |
|---|---|
| ChronicCond_Heartfailure | |
| 1 | 1670.718692 |
| 2 | 935.022664 |
CC_HF_IP_D = pd.DataFrame(train_bene_df.groupby(['ChronicCond_Heartfailure'])['IPAnnualDeductibleAmt'].sum() / train_bene_df.groupby(['ChronicCond_Heartfailure'])['BeneID'].count())
CC_HF_IP_D.columns = ['AVG IP Co-payment Amt']
CC_HF_IP_D
| AVG IP Co-payment Amt | |
|---|---|
| ChronicCond_Heartfailure | |
| 1 | 589.217421 |
| 2 | 215.206432 |
CC_HF_OP_D = pd.DataFrame(train_bene_df.groupby(['ChronicCond_Heartfailure'])['OPAnnualDeductibleAmt'].sum() / train_bene_df.groupby(['ChronicCond_Heartfailure'])['BeneID'].count())
CC_HF_OP_D.columns = ['AVG OP Co-payment Amt']
CC_HF_OP_D
| AVG OP Co-payment Amt | |
|---|---|
| ChronicCond_Heartfailure | |
| 1 | 477.322856 |
| 2 | 280.601149 |
CC_HF_all_amts = pd.concat([CC_HF_IP_R, CC_HF_OP_R, CC_HF_IP_D, CC_HF_OP_D], axis=1)
CC_HF_all_amts
| AVG IP Reimbursement Amt | AVG OP Reimbursement Amt | AVG IP Co-payment Amt | AVG OP Co-payment Amt | |
|---|---|---|---|---|
| ChronicCond_Heartfailure | ||||
| 1 | 5421.714862 | 1670.718692 | 589.217421 | 477.322856 |
| 2 | 1942.965904 | 935.022664 | 215.206432 | 280.601149 |
# Here, I'm displaying the Total Annual Sum of IP Co-payment for 'ChronicCond_Heartfailure'
with plt.style.context('seaborn-poster'):
fig = CC_HF_all_amts.plot(kind='bar', colormap='rainbow')
# Using the "patches" function we will get the location of the rectangle bars from the graph.
## Then by using those location(width & height) values we will add the annotations
for p in fig.patches:
width = p.get_width()
height = p.get_height()
x, y = p.get_xy()
fig.annotate(f'{round(height,0)}', (x + width/2, y + height*1.015), ha='center', fontsize=12, rotation=0)
# Providing the labels and title to the graph
plt.xlabel("\nHaving Chronic HF Disease?", fontdict=label_font_dict)
plt.xticks(ticks=[0,1], labels=['YES', 'NO'], fontsize=13, rotation=30)
plt.ylabel("Total Annual Sum \n", fontdict=label_font_dict)
plt.grid(which='major', linestyle="-.", color='lightgrey')
plt.minorticks_on()
plt.title("Total Annual Sum of various amounts : 'ChronicCond_Heartfailure'\n", fontdict=title_font_dict)
# 1 means +ve with Chronic HF Disease
# 2 means -ve with Chronic HF Disease
print(CC_HF_all_amts,"\n")
AVG IP Reimbursement Amt AVG OP Reimbursement Amt \
ChronicCond_Heartfailure
1 5421.714862 1670.718692
2 1942.965904 935.022664
AVG IP Co-payment Amt AVG OP Co-payment Amt
ChronicCond_Heartfailure
1 589.217421 477.322856
2 215.206432 280.601149
OBSERVATION# Number of beneficiaries with chronic or no-chronic conditions
pd.DataFrame(train_bene_df.groupby(['ChronicCond_KidneyDisease'])['BeneID'].count())
| BeneID | |
|---|---|
| ChronicCond_KidneyDisease | |
| 1 | 43279 |
| 2 | 95277 |
CC_KD_IP_R = pd.DataFrame(train_bene_df.groupby(['ChronicCond_KidneyDisease'])['IPAnnualReimbursementAmt'].sum() / train_bene_df.groupby(['ChronicCond_KidneyDisease'])['BeneID'].count())
CC_KD_IP_R.columns = ['AVG IP Reimbursement Amt']
CC_KD_IP_R
| AVG IP Reimbursement Amt | |
|---|---|
| ChronicCond_KidneyDisease | |
| 1 | 7500.768040 |
| 2 | 1915.858287 |
CC_KD_OP_R = pd.DataFrame(train_bene_df.groupby(['ChronicCond_KidneyDisease'])['OPAnnualReimbursementAmt'].sum() / train_bene_df.groupby(['ChronicCond_KidneyDisease'])['BeneID'].count())
CC_KD_OP_R.columns = ['AVG OP Reimbursement Amt']
CC_KD_OP_R
| AVG OP Reimbursement Amt | |
|---|---|
| ChronicCond_KidneyDisease | |
| 1 | 2061.602625 |
| 2 | 951.457120 |
CC_KD_IP_D = pd.DataFrame(train_bene_df.groupby(['ChronicCond_KidneyDisease'])['IPAnnualDeductibleAmt'].sum() / train_bene_df.groupby(['ChronicCond_KidneyDisease'])['BeneID'].count())
CC_KD_IP_D.columns = ['AVG IP Co-payment Amt']
CC_KD_IP_D
| AVG IP Co-payment Amt | |
|---|---|
| ChronicCond_KidneyDisease | |
| 1 | 785.385707 |
| 2 | 224.718809 |
CC_KD_OP_D = pd.DataFrame(train_bene_df.groupby(['ChronicCond_KidneyDisease'])['OPAnnualDeductibleAmt'].sum() / train_bene_df.groupby(['ChronicCond_KidneyDisease'])['BeneID'].count())
CC_KD_OP_D.columns = ['AVG OP Co-payment Amt']
CC_KD_OP_D
| AVG OP Co-payment Amt | |
|---|---|
| ChronicCond_KidneyDisease | |
| 1 | 582.706417 |
| 2 | 284.603629 |
CC_KD_all_amts = pd.concat([CC_KD_IP_R, CC_KD_OP_R, CC_KD_IP_D, CC_KD_OP_D], axis=1)
CC_KD_all_amts
| AVG IP Reimbursement Amt | AVG OP Reimbursement Amt | AVG IP Co-payment Amt | AVG OP Co-payment Amt | |
|---|---|---|---|---|
| ChronicCond_KidneyDisease | ||||
| 1 | 7500.768040 | 2061.602625 | 785.385707 | 582.706417 |
| 2 | 1915.858287 | 951.457120 | 224.718809 | 284.603629 |
# Here, I'm displaying the Total Annual Sum of IP Co-payment for 'ChronicCond_KidneyDisease'
with plt.style.context('seaborn-poster'):
fig = CC_KD_all_amts.plot(kind='bar', colormap='rainbow')
# Using the "patches" function we will get the location of the rectangle bars from the graph.
## Then by using those location(width & height) values we will add the annotations
for p in fig.patches:
width = p.get_width()
height = p.get_height()
x, y = p.get_xy()
fig.annotate(f'{round(height,0)}', (x + width/2, y + height*1.015), ha='center', fontsize=12, rotation=0)
# Providing the labels and title to the graph
plt.xlabel("\nHaving Chronic KD Disease?", fontdict=label_font_dict)
plt.xticks(ticks=[0,1], labels=['YES', 'NO'], fontsize=13, rotation=30)
plt.ylabel("Total Annual Sum \n", fontdict=label_font_dict)
plt.grid(which='major', linestyle="-.", color='lightgrey')
plt.minorticks_on()
plt.title("Total Annual Sum of various amounts : 'ChronicCond_KidneyDisease'\n", fontdict=title_font_dict)
# 1 means +ve with Chronic KD Disease
# 2 means -ve with Chronic KD Disease
print(CC_KD_all_amts,"\n")
AVG IP Reimbursement Amt AVG OP Reimbursement Amt \
ChronicCond_KidneyDisease
1 7500.768040 2061.602625
2 1915.858287 951.457120
AVG IP Co-payment Amt AVG OP Co-payment Amt
ChronicCond_KidneyDisease
1 785.385707 582.706417
2 224.718809 284.603629
OBSERVATION# Number of beneficiaries with chronic or no-chronic conditions
pd.DataFrame(train_bene_df.groupby(['ChronicCond_Cancer'])['BeneID'].count())
| BeneID | |
|---|---|
| ChronicCond_Cancer | |
| 1 | 16621 |
| 2 | 121935 |
CC_CN_IP_R = pd.DataFrame(train_bene_df.groupby(['ChronicCond_Cancer'])['IPAnnualReimbursementAmt'].sum() / train_bene_df.groupby(['ChronicCond_Cancer'])['BeneID'].count())
CC_CN_IP_R.columns = ['AVG IP Reimbursement Amt']
CC_CN_IP_R
| AVG IP Reimbursement Amt | |
|---|---|
| ChronicCond_Cancer | |
| 1 | 6067.532038 |
| 2 | 3332.222250 |
CC_CN_OP_R = pd.DataFrame(train_bene_df.groupby(['ChronicCond_Cancer'])['OPAnnualReimbursementAmt'].sum() / train_bene_df.groupby(['ChronicCond_Cancer'])['BeneID'].count())
CC_CN_OP_R.columns = ['AVG OP Reimbursement Amt']
CC_CN_OP_R
| AVG OP Reimbursement Amt | |
|---|---|
| ChronicCond_Cancer | |
| 1 | 1788.427291 |
| 2 | 1231.398942 |
CC_CN_IP_D = pd.DataFrame(train_bene_df.groupby(['ChronicCond_Cancer'])['IPAnnualDeductibleAmt'].sum() / train_bene_df.groupby(['ChronicCond_Cancer'])['BeneID'].count())
CC_CN_IP_D.columns = ['AVG IP Co-payment Amt']
CC_CN_IP_D
| AVG IP Co-payment Amt | |
|---|---|
| ChronicCond_Cancer | |
| 1 | 636.339811 |
| 2 | 367.610924 |
CC_CN_OP_D = pd.DataFrame(train_bene_df.groupby(['ChronicCond_Cancer'])['OPAnnualDeductibleAmt'].sum() / train_bene_df.groupby(['ChronicCond_Cancer'])['BeneID'].count())
CC_CN_OP_D.columns = ['AVG OP Co-payment Amt']
CC_CN_OP_D
| AVG OP Co-payment Amt | |
|---|---|
| ChronicCond_Cancer | |
| 1 | 520.832682 |
| 2 | 358.210284 |
CC_CN_all_amts = pd.concat([CC_CN_IP_R, CC_CN_OP_R, CC_CN_IP_D, CC_CN_OP_D], axis=1)
CC_CN_all_amts
| AVG IP Reimbursement Amt | AVG OP Reimbursement Amt | AVG IP Co-payment Amt | AVG OP Co-payment Amt | |
|---|---|---|---|---|
| ChronicCond_Cancer | ||||
| 1 | 6067.532038 | 1788.427291 | 636.339811 | 520.832682 |
| 2 | 3332.222250 | 1231.398942 | 367.610924 | 358.210284 |
# Here, I'm displaying the Total Annual Sum of IP Co-payment for 'ChronicCond_Cancer'
with plt.style.context('seaborn-poster'):
fig = CC_CN_all_amts.plot(kind='bar', colormap='rainbow')
# Using the "patches" function we will get the location of the rectangle bars from the graph.
## Then by using those location(width & height) values we will add the annotations
for p in fig.patches:
width = p.get_width()
height = p.get_height()
x, y = p.get_xy()
fig.annotate(f'{round(height,0)}', (x + width/2, y + height*1.015), ha='center', fontsize=12, rotation=0)
# Providing the labels and title to the graph
plt.xlabel("\nHaving Chronic CN Disease?", fontdict=label_font_dict)
plt.xticks(ticks=[0,1], labels=['YES', 'NO'], fontsize=13, rotation=30)
plt.ylabel("Total Annual Sum \n", fontdict=label_font_dict)
plt.grid(which='major', linestyle="-.", color='lightgrey')
plt.minorticks_on()
plt.title("Total Annual Sum of various amounts : 'ChronicCond_Cancer'\n", fontdict=title_font_dict)
# 1 means +ve with Chronic CN Disease
# 2 means -ve with Chronic CN Disease
print(CC_CN_all_amts,"\n")
AVG IP Reimbursement Amt AVG OP Reimbursement Amt \
ChronicCond_Cancer
1 6067.532038 1788.427291
2 3332.222250 1231.398942
AVG IP Co-payment Amt AVG OP Co-payment Amt
ChronicCond_Cancer
1 636.339811 520.832682
2 367.610924 358.210284
OBSERVATION# Number of beneficiaries with chronic or no-chronic conditions
pd.DataFrame(train_bene_df.groupby(['ChronicCond_ObstrPulmonary'])['BeneID'].count())
| BeneID | |
|---|---|
| ChronicCond_ObstrPulmonary | |
| 1 | 32859 |
| 2 | 105697 |
CC_PL_IP_R = pd.DataFrame(train_bene_df.groupby(['ChronicCond_ObstrPulmonary'])['IPAnnualReimbursementAmt'].sum() / train_bene_df.groupby(['ChronicCond_ObstrPulmonary'])['BeneID'].count())
CC_PL_IP_R.columns = ['AVG IP Reimbursement Amt']
CC_PL_IP_R
| AVG IP Reimbursement Amt | |
|---|---|
| ChronicCond_ObstrPulmonary | |
| 1 | 7361.772422 |
| 2 | 2509.650132 |
CC_PL_OP_R = pd.DataFrame(train_bene_df.groupby(['ChronicCond_ObstrPulmonary'])['OPAnnualReimbursementAmt'].sum() / train_bene_df.groupby(['ChronicCond_ObstrPulmonary'])['BeneID'].count())
CC_PL_OP_R.columns = ['AVG OP Reimbursement Amt']
CC_PL_OP_R
| AVG OP Reimbursement Amt | |
|---|---|
| ChronicCond_ObstrPulmonary | |
| 1 | 1810.864908 |
| 2 | 1138.848501 |
CC_PL_IP_D = pd.DataFrame(train_bene_df.groupby(['ChronicCond_ObstrPulmonary'])['IPAnnualDeductibleAmt'].sum() / train_bene_df.groupby(['ChronicCond_ObstrPulmonary'])['BeneID'].count())
CC_PL_IP_D.columns = ['AVG IP Co-payment Amt']
CC_PL_IP_D
| AVG IP Co-payment Amt | |
|---|---|
| ChronicCond_ObstrPulmonary | |
| 1 | 818.228917 |
| 2 | 269.781148 |
CC_PL_OP_D = pd.DataFrame(train_bene_df.groupby(['ChronicCond_ObstrPulmonary'])['OPAnnualDeductibleAmt'].sum() / train_bene_df.groupby(['ChronicCond_ObstrPulmonary'])['BeneID'].count())
CC_PL_OP_D.columns = ['AVG OP Co-payment Amt']
CC_PL_OP_D
| AVG OP Co-payment Amt | |
|---|---|
| ChronicCond_ObstrPulmonary | |
| 1 | 519.44767 |
| 2 | 333.65753 |
CC_PL_all_amts = pd.concat([CC_PL_IP_R, CC_PL_OP_R, CC_PL_IP_D, CC_PL_OP_D], axis=1)
CC_PL_all_amts
| AVG IP Reimbursement Amt | AVG OP Reimbursement Amt | AVG IP Co-payment Amt | AVG OP Co-payment Amt | |
|---|---|---|---|---|
| ChronicCond_ObstrPulmonary | ||||
| 1 | 7361.772422 | 1810.864908 | 818.228917 | 519.44767 |
| 2 | 2509.650132 | 1138.848501 | 269.781148 | 333.65753 |
# Here, I'm displaying the Total Annual Sum of IP Co-payment for 'ChronicCond_ObstrPulmonary'
with plt.style.context('seaborn-poster'):
fig = CC_PL_all_amts.plot(kind='bar', colormap='rainbow')
# Using the "patches" function we will get the location of the rectangle bars from the graph.
## Then by using those location(width & height) values we will add the annotations
for p in fig.patches:
width = p.get_width()
height = p.get_height()
x, y = p.get_xy()
fig.annotate(f'{round(height,0)}', (x + width/2, y + height*1.015), ha='center', fontsize=12, rotation=0)
# Providing the labels and title to the graph
plt.xlabel("\nHaving Chronic PL Disease?", fontdict=label_font_dict)
plt.xticks(ticks=[0,1], labels=['YES', 'NO'], fontsize=13, rotation=30)
plt.ylabel("Total Annual Sum \n", fontdict=label_font_dict)
plt.grid(which='major', linestyle="-.", color='lightgrey')
plt.minorticks_on()
plt.title("Total Annual Sum of various amounts : 'ChronicCond_ObstrPulmonary'\n", fontdict=title_font_dict)
# 1 means +ve with Chronic PL Disease
# 2 means -ve with Chronic PL Disease
OBSERVATION# Number of beneficiaries with chronic or no-chronic conditions
pd.DataFrame(train_bene_df.groupby(['ChronicCond_Depression'])['BeneID'].count())
| BeneID | |
|---|---|
| ChronicCond_Depression | |
| 1 | 49260 |
| 2 | 89296 |
CC_DP_IP_R = pd.DataFrame(train_bene_df.groupby(['ChronicCond_Depression'])['IPAnnualReimbursementAmt'].sum() / train_bene_df.groupby(['ChronicCond_Depression'])['BeneID'].count())
CC_DP_IP_R.columns = ['AVG IP Reimbursement Amt']
CC_DP_IP_R
| AVG IP Reimbursement Amt | |
|---|---|
| ChronicCond_Depression | |
| 1 | 5022.162810 |
| 2 | 2909.102647 |
CC_DP_OP_R = pd.DataFrame(train_bene_df.groupby(['ChronicCond_Depression'])['OPAnnualReimbursementAmt'].sum() / train_bene_df.groupby(['ChronicCond_Depression'])['BeneID'].count())
CC_DP_OP_R.columns = ['AVG OP Reimbursement Amt']
CC_DP_OP_R
| AVG OP Reimbursement Amt | |
|---|---|
| ChronicCond_Depression | |
| 1 | 1603.871701 |
| 2 | 1129.606701 |
CC_DP_IP_D = pd.DataFrame(train_bene_df.groupby(['ChronicCond_Depression'])['IPAnnualDeductibleAmt'].sum() / train_bene_df.groupby(['ChronicCond_Depression'])['BeneID'].count())
CC_DP_IP_D.columns = ['AVG IP Co-payment Amt']
CC_DP_IP_D
| AVG IP Co-payment Amt | |
|---|---|
| ChronicCond_Depression | |
| 1 | 570.054771 |
| 2 | 305.952607 |
CC_DP_OP_D = pd.DataFrame(train_bene_df.groupby(['ChronicCond_Depression'])['OPAnnualDeductibleAmt'].sum() / train_bene_df.groupby(['ChronicCond_Depression'])['BeneID'].count())
CC_DP_OP_D.columns = ['AVG OP Co-payment Amt']
CC_DP_OP_D
| AVG OP Co-payment Amt | |
|---|---|
| ChronicCond_Depression | |
| 1 | 464.285708 |
| 2 | 329.963459 |
CC_DP_all_amts = pd.concat([CC_DP_IP_R, CC_DP_OP_R, CC_DP_IP_D, CC_DP_OP_D], axis=1)
CC_DP_all_amts
| AVG IP Reimbursement Amt | AVG OP Reimbursement Amt | AVG IP Co-payment Amt | AVG OP Co-payment Amt | |
|---|---|---|---|---|
| ChronicCond_Depression | ||||
| 1 | 5022.162810 | 1603.871701 | 570.054771 | 464.285708 |
| 2 | 2909.102647 | 1129.606701 | 305.952607 | 329.963459 |
# Here, I'm displaying the Total Annual Sum of IP Co-payment for 'ChronicCond_Depression'
with plt.style.context('seaborn-poster'):
fig = CC_DP_all_amts.plot(kind='bar', colormap='rainbow')
# Using the "patches" function we will get the location of the rectangle bars from the graph.
## Then by using those location(width & height) values we will add the annotations
for p in fig.patches:
width = p.get_width()
height = p.get_height()
x, y = p.get_xy()
fig.annotate(f'{round(height,0)}', (x + width/2, y + height*1.015), ha='center', fontsize=12, rotation=0)
# Providing the labels and title to the graph
plt.xlabel("\nHaving Chronic DP Disease?", fontdict=label_font_dict)
plt.xticks(ticks=[0,1], labels=['YES', 'NO'], fontsize=13, rotation=30)
plt.ylabel("Total Annual Sum \n", fontdict=label_font_dict)
plt.grid(which='major', linestyle="-.", color='lightgrey')
plt.minorticks_on()
plt.title("Total Annual Sum of various amounts : 'ChronicCond_Depression'\n", fontdict=title_font_dict)
# 1 means +ve with Chronic DP Disease
# 2 means -ve with Chronic DP Disease
OBSERVATION# Number of beneficiaries with chronic or no-chronic conditions
pd.DataFrame(train_bene_df.groupby(['ChronicCond_Diabetes'])['BeneID'].count())
| BeneID | |
|---|---|
| ChronicCond_Diabetes | |
| 1 | 83391 |
| 2 | 55165 |
CC_DB_IP_R = pd.DataFrame(train_bene_df.groupby(['ChronicCond_Diabetes'])['IPAnnualReimbursementAmt'].sum() / train_bene_df.groupby(['ChronicCond_Diabetes'])['BeneID'].count())
CC_DB_IP_R.columns = ['AVG IP Reimbursement Amt']
CC_DB_IP_R
| AVG IP Reimbursement Amt | |
|---|---|
| ChronicCond_Diabetes | |
| 1 | 4870.622010 |
| 2 | 1830.815372 |
CC_DB_OP_R = pd.DataFrame(train_bene_df.groupby(['ChronicCond_Diabetes'])['OPAnnualReimbursementAmt'].sum() / train_bene_df.groupby(['ChronicCond_Diabetes'])['BeneID'].count())
CC_DB_OP_R.columns = ['AVG OP Reimbursement Amt']
CC_DB_OP_R
| AVG OP Reimbursement Amt | |
|---|---|
| ChronicCond_Diabetes | |
| 1 | 1607.259536 |
| 2 | 831.054110 |
CC_DB_IP_D = pd.DataFrame(train_bene_df.groupby(['ChronicCond_Diabetes'])['IPAnnualDeductibleAmt'].sum() / train_bene_df.groupby(['ChronicCond_Diabetes'])['BeneID'].count())
CC_DB_IP_D.columns = ['AVG IP Co-payment Amt']
CC_DB_IP_D
| AVG IP Co-payment Amt | |
|---|---|
| ChronicCond_Diabetes | |
| 1 | 534.254680 |
| 2 | 196.668359 |
CC_DB_OP_D = pd.DataFrame(train_bene_df.groupby(['ChronicCond_Diabetes'])['OPAnnualDeductibleAmt'].sum() / train_bene_df.groupby(['ChronicCond_Diabetes'])['BeneID'].count())
CC_DB_OP_D.columns = ['AVG OP Co-payment Amt']
CC_DB_OP_D
| AVG OP Co-payment Amt | |
|---|---|
| ChronicCond_Diabetes | |
| 1 | 464.345421 |
| 2 | 246.767008 |
CC_DB_all_amts = pd.concat([CC_DB_IP_R, CC_DB_OP_R, CC_DB_IP_D, CC_DB_OP_D], axis=1)
CC_DB_all_amts
| AVG IP Reimbursement Amt | AVG OP Reimbursement Amt | AVG IP Co-payment Amt | AVG OP Co-payment Amt | |
|---|---|---|---|---|
| ChronicCond_Diabetes | ||||
| 1 | 4870.622010 | 1607.259536 | 534.254680 | 464.345421 |
| 2 | 1830.815372 | 831.054110 | 196.668359 | 246.767008 |
# Here, I'm displaying the Total Annual Sum of IP Co-payment for 'ChronicCond_Diabetes'
with plt.style.context('seaborn-poster'):
fig = CC_DB_all_amts.plot(kind='bar', colormap='rainbow')
# Using the "patches" function we will get the location of the rectangle bars from the graph.
## Then by using those location(width & height) values we will add the annotations
for p in fig.patches:
width = p.get_width()
height = p.get_height()
x, y = p.get_xy()
fig.annotate(f'{round(height,0)}', (x + width/2, y + height*1.015), ha='center', fontsize=12, rotation=0)
# Providing the labels and title to the graph
plt.xlabel("\nHaving Chronic DB Disease?", fontdict=label_font_dict)
plt.xticks(ticks=[0,1], labels=['YES', 'NO'], fontsize=13, rotation=30)
plt.ylabel("Total Annual Sum \n", fontdict=label_font_dict)
plt.grid(which='major', linestyle="-.", color='lightgrey')
plt.minorticks_on()
plt.title("Total Annual Sum of various amounts : 'ChronicCond_Diabetes'\n", fontdict=title_font_dict)
# 1 means +ve with Chronic DB Disease
# 2 means -ve with Chronic DB Disease
OBSERVATION# Number of beneficiaries with chronic or no-chronic conditions
pd.DataFrame(train_bene_df.groupby(['ChronicCond_IschemicHeart'])['BeneID'].count())
| BeneID | |
|---|---|
| ChronicCond_IschemicHeart | |
| 1 | 93644 |
| 2 | 44912 |
CC_IH_IP_R = pd.DataFrame(train_bene_df.groupby(['ChronicCond_IschemicHeart'])['IPAnnualReimbursementAmt'].sum() / train_bene_df.groupby(['ChronicCond_IschemicHeart'])['BeneID'].count())
CC_IH_IP_R.columns = ['AVG IP Reimbursement Amt']
CC_IH_IP_R
| AVG IP Reimbursement Amt | |
|---|---|
| ChronicCond_IschemicHeart | |
| 1 | 4697.683034 |
| 2 | 1497.442554 |
CC_IH_OP_R = pd.DataFrame(train_bene_df.groupby(['ChronicCond_IschemicHeart'])['OPAnnualReimbursementAmt'].sum() / train_bene_df.groupby(['ChronicCond_IschemicHeart'])['BeneID'].count())
CC_IH_OP_R.columns = ['AVG OP Reimbursement Amt']
CC_IH_OP_R
| AVG OP Reimbursement Amt | |
|---|---|
| ChronicCond_IschemicHeart | |
| 1 | 1521.077485 |
| 2 | 833.547827 |
CC_IH_IP_D = pd.DataFrame(train_bene_df.groupby(['ChronicCond_IschemicHeart'])['IPAnnualDeductibleAmt'].sum() / train_bene_df.groupby(['ChronicCond_IschemicHeart'])['BeneID'].count())
CC_IH_IP_D.columns = ['AVG IP Co-payment Amt']
CC_IH_IP_D
| AVG IP Co-payment Amt | |
|---|---|
| ChronicCond_IschemicHeart | |
| 1 | 506.661676 |
| 2 | 177.133416 |
CC_IH_OP_D = pd.DataFrame(train_bene_df.groupby(['ChronicCond_IschemicHeart'])['OPAnnualDeductibleAmt'].sum() / train_bene_df.groupby(['ChronicCond_IschemicHeart'])['BeneID'].count())
CC_IH_OP_D.columns = ['AVG OP Co-payment Amt']
CC_IH_OP_D
| AVG OP Co-payment Amt | |
|---|---|
| ChronicCond_IschemicHeart | |
| 1 | 441.447535 |
| 2 | 244.839197 |
CC_IH_all_amts = pd.concat([CC_IH_IP_R, CC_IH_OP_R, CC_IH_IP_D, CC_IH_OP_D], axis=1)
CC_IH_all_amts
| AVG IP Reimbursement Amt | AVG OP Reimbursement Amt | AVG IP Co-payment Amt | AVG OP Co-payment Amt | |
|---|---|---|---|---|
| ChronicCond_IschemicHeart | ||||
| 1 | 4697.683034 | 1521.077485 | 506.661676 | 441.447535 |
| 2 | 1497.442554 | 833.547827 | 177.133416 | 244.839197 |
# Here, I'm displaying the Total Annual Sum of IP Co-payment for 'ChronicCond_IschemicHeart'
with plt.style.context('seaborn-poster'):
fig = CC_IH_all_amts.plot(kind='bar', colormap='rainbow')
# Using the "patches" function we will get the location of the rectangle bars from the graph.
## Then by using those location(width & height) values we will add the annotations
for p in fig.patches:
width = p.get_width()
height = p.get_height()
x, y = p.get_xy()
fig.annotate(f'{round(height,0)}', (x + width/2, y + height*1.015), ha='center', fontsize=12, rotation=0)
# Providing the labels and title to the graph
plt.xlabel("\nHaving Chronic IH Disease?", fontdict=label_font_dict)
plt.xticks(ticks=[0,1], labels=['YES', 'NO'], fontsize=13, rotation=30)
plt.ylabel("Total Annual Sum \n", fontdict=label_font_dict)
plt.grid(which='major', linestyle="-.", color='lightgrey')
plt.minorticks_on()
plt.title("Total Annual Sum of various amounts : 'ChronicCond_IschemicHeart'\n", fontdict=title_font_dict)
# 1 means +ve with Chronic IH Disease
# 2 means -ve with Chronic IH Disease
OBSERVATION# Number of beneficiaries with chronic or no-chronic conditions
pd.DataFrame(train_bene_df.groupby(['ChronicCond_Osteoporasis'])['BeneID'].count())
| BeneID | |
|---|---|
| ChronicCond_Osteoporasis | |
| 1 | 38059 |
| 2 | 100497 |
CC_OS_IP_R = pd.DataFrame(train_bene_df.groupby(['ChronicCond_Osteoporasis'])['IPAnnualReimbursementAmt'].sum() / train_bene_df.groupby(['ChronicCond_Osteoporasis'])['BeneID'].count())
CC_OS_IP_R.columns = ['AVG IP Reimbursement Amt']
CC_OS_IP_R
| AVG IP Reimbursement Amt | |
|---|---|
| ChronicCond_Osteoporasis | |
| 1 | 4608.419822 |
| 2 | 3301.303721 |
CC_OS_OP_R = pd.DataFrame(train_bene_df.groupby(['ChronicCond_Osteoporasis'])['OPAnnualReimbursementAmt'].sum() / train_bene_df.groupby(['ChronicCond_Osteoporasis'])['BeneID'].count())
CC_OS_OP_R.columns = ['AVG OP Reimbursement Amt']
CC_OS_OP_R
| AVG OP Reimbursement Amt | |
|---|---|
| ChronicCond_Osteoporasis | |
| 1 | 1494.709004 |
| 2 | 1223.807178 |
CC_OS_IP_D = pd.DataFrame(train_bene_df.groupby(['ChronicCond_Osteoporasis'])['IPAnnualDeductibleAmt'].sum() / train_bene_df.groupby(['ChronicCond_Osteoporasis'])['BeneID'].count())
CC_OS_IP_D.columns = ['AVG IP Co-payment Amt']
CC_OS_IP_D
| AVG IP Co-payment Amt | |
|---|---|
| ChronicCond_Osteoporasis | |
| 1 | 512.038204 |
| 2 | 357.359722 |
CC_OS_OP_D = pd.DataFrame(train_bene_df.groupby(['ChronicCond_Osteoporasis'])['OPAnnualDeductibleAmt'].sum() / train_bene_df.groupby(['ChronicCond_Osteoporasis'])['BeneID'].count())
CC_OS_OP_D.columns = ['AVG OP Co-payment Amt']
CC_OS_OP_D
| AVG OP Co-payment Amt | |
|---|---|
| ChronicCond_Osteoporasis | |
| 1 | 436.249271 |
| 2 | 355.552106 |
CC_OS_all_amts = pd.concat([CC_OS_IP_R, CC_OS_OP_R, CC_OS_IP_D, CC_OS_OP_D], axis=1)
CC_OS_all_amts
| AVG IP Reimbursement Amt | AVG OP Reimbursement Amt | AVG IP Co-payment Amt | AVG OP Co-payment Amt | |
|---|---|---|---|---|
| ChronicCond_Osteoporasis | ||||
| 1 | 4608.419822 | 1494.709004 | 512.038204 | 436.249271 |
| 2 | 3301.303721 | 1223.807178 | 357.359722 | 355.552106 |
# Here, I'm displaying the Total Annual Sum of IP Co-payment for 'ChronicCond_Osteoporasis'
with plt.style.context('seaborn-poster'):
fig = CC_OS_all_amts.plot(kind='bar', colormap='rainbow')
# Using the "patches" function we will get the location of the rectangle bars from the graph.
## Then by using those location(width & height) values we will add the annotations
for p in fig.patches:
width = p.get_width()
height = p.get_height()
x, y = p.get_xy()
fig.annotate(f'{round(height,0)}', (x + width/2, y + height*1.015), ha='center', fontsize=12, rotation=0)
# Providing the labels and title to the graph
plt.xlabel("\nHaving Chronic OS Disease?", fontdict=label_font_dict)
plt.xticks(ticks=[0,1], labels=['YES', 'NO'], fontsize=13, rotation=30)
plt.ylabel("Total Annual Sum \n", fontdict=label_font_dict)
plt.grid(which='major', linestyle="-.", color='lightgrey')
plt.minorticks_on()
plt.title("Total Annual Sum of various amounts : 'ChronicCond_Osteoporasis'\n", fontdict=title_font_dict)
# 1 means +ve with Chronic OS Disease
# 2 means -ve with Chronic OS Disease
OBSERVATION# Number of beneficiaries with chronic or no-chronic conditions
pd.DataFrame(train_bene_df.groupby(['ChronicCond_rheumatoidarthritis'])['BeneID'].count())
| BeneID | |
|---|---|
| ChronicCond_rheumatoidarthritis | |
| 1 | 35584 |
| 2 | 102972 |
CC_RH_IP_R = pd.DataFrame(train_bene_df.groupby(['ChronicCond_rheumatoidarthritis'])['IPAnnualReimbursementAmt'].sum() / train_bene_df.groupby(['ChronicCond_rheumatoidarthritis'])['BeneID'].count())
CC_RH_IP_R.columns = ['AVG IP Reimbursement Amt']
CC_RH_IP_R
| AVG IP Reimbursement Amt | |
|---|---|
| ChronicCond_rheumatoidarthritis | |
| 1 | 5102.027597 |
| 2 | 3162.145243 |
CC_RH_OP_R = pd.DataFrame(train_bene_df.groupby(['ChronicCond_rheumatoidarthritis'])['OPAnnualReimbursementAmt'].sum() / train_bene_df.groupby(['ChronicCond_rheumatoidarthritis'])['BeneID'].count())
CC_RH_OP_R.columns = ['AVG OP Reimbursement Amt']
CC_RH_OP_R
| AVG OP Reimbursement Amt | |
|---|---|
| ChronicCond_rheumatoidarthritis | |
| 1 | 1548.057835 |
| 2 | 1211.882745 |
CC_RH_IP_D = pd.DataFrame(train_bene_df.groupby(['ChronicCond_rheumatoidarthritis'])['IPAnnualDeductibleAmt'].sum() / train_bene_df.groupby(['ChronicCond_rheumatoidarthritis'])['BeneID'].count())
CC_RH_IP_D.columns = ['AVG IP Co-payment Amt']
CC_RH_IP_D
| AVG IP Co-payment Amt | |
|---|---|
| ChronicCond_rheumatoidarthritis | |
| 1 | 562.630789 |
| 2 | 343.594259 |
CC_RH_OP_D = pd.DataFrame(train_bene_df.groupby(['ChronicCond_rheumatoidarthritis'])['OPAnnualDeductibleAmt'].sum() / train_bene_df.groupby(['ChronicCond_rheumatoidarthritis'])['BeneID'].count())
CC_RH_OP_D.columns = ['AVG OP Co-payment Amt']
CC_RH_OP_D
| AVG OP Co-payment Amt | |
|---|---|
| ChronicCond_rheumatoidarthritis | |
| 1 | 453.423196 |
| 2 | 351.556928 |
CC_RH_all_amts = pd.concat([CC_RH_IP_R, CC_RH_OP_R, CC_RH_IP_D, CC_RH_OP_D], axis=1)
CC_RH_all_amts
| AVG IP Reimbursement Amt | AVG OP Reimbursement Amt | AVG IP Co-payment Amt | AVG OP Co-payment Amt | |
|---|---|---|---|---|
| ChronicCond_rheumatoidarthritis | ||||
| 1 | 5102.027597 | 1548.057835 | 562.630789 | 453.423196 |
| 2 | 3162.145243 | 1211.882745 | 343.594259 | 351.556928 |
# Here, I'm displaying the Total Annual Sum of IP Co-payment for 'ChronicCond_rheumatoidarthritis'
with plt.style.context('seaborn-poster'):
fig = CC_RH_all_amts.plot(kind='bar', colormap='rainbow')
# Using the "patches" function we will get the location of the rectangle bars from the graph.
## Then by using those location(width & height) values we will add the annotations
for p in fig.patches:
width = p.get_width()
height = p.get_height()
x, y = p.get_xy()
fig.annotate(f'{round(height,0)}', (x + width/2, y + height*1.015), ha='center', fontsize=12, rotation=0)
# Providing the labels and title to the graph
plt.xlabel("\nHaving Chronic RH Disease?", fontdict=label_font_dict)
plt.xticks(ticks=[0,1], labels=['YES', 'NO'], fontsize=13, rotation=30)
plt.ylabel("Total Annual Sum \n", fontdict=label_font_dict)
plt.grid(which='major', linestyle="-.", color='lightgrey')
plt.minorticks_on()
plt.title("Total Annual Sum of various amounts : 'ChronicCond_rheumatoidarthritis'\n", fontdict=title_font_dict)
# 1 means +ve with Chronic RH Disease
# 2 means -ve with Chronic RH Disease
OBSERVATION# Number of beneficiaries with chronic or no-chronic conditions
pd.DataFrame(train_bene_df.groupby(['ChronicCond_stroke'])['BeneID'].count())
| BeneID | |
|---|---|
| ChronicCond_stroke | |
| 1 | 10954 |
| 2 | 127602 |
CC_ST_IP_R = pd.DataFrame(train_bene_df.groupby(['ChronicCond_stroke'])['IPAnnualReimbursementAmt'].sum() / train_bene_df.groupby(['ChronicCond_stroke'])['BeneID'].count())
CC_ST_IP_R.columns = ['AVG IP Reimbursement Amt']
CC_ST_IP_R
| AVG IP Reimbursement Amt | |
|---|---|
| ChronicCond_stroke | |
| 1 | 8111.475260 |
| 2 | 3278.239134 |
CC_ST_OP_R = pd.DataFrame(train_bene_df.groupby(['ChronicCond_stroke'])['OPAnnualReimbursementAmt'].sum() / train_bene_df.groupby(['ChronicCond_stroke'])['BeneID'].count())
CC_ST_OP_R.columns = ['AVG OP Reimbursement Amt']
CC_ST_OP_R
| AVG OP Reimbursement Amt | |
|---|---|
| ChronicCond_stroke | |
| 1 | 1925.197188 |
| 2 | 1244.396404 |
CC_ST_IP_D = pd.DataFrame(train_bene_df.groupby(['ChronicCond_stroke'])['IPAnnualDeductibleAmt'].sum() / train_bene_df.groupby(['ChronicCond_stroke'])['BeneID'].count())
CC_ST_IP_D.columns = ['AVG IP Co-payment Amt']
CC_ST_IP_D
| AVG IP Co-payment Amt | |
|---|---|
| ChronicCond_stroke | |
| 1 | 861.399489 |
| 2 | 360.225326 |
CC_ST_OP_D = pd.DataFrame(train_bene_df.groupby(['ChronicCond_stroke'])['OPAnnualDeductibleAmt'].sum() / train_bene_df.groupby(['ChronicCond_stroke'])['BeneID'].count())
CC_ST_OP_D.columns = ['AVG OP Co-payment Amt']
CC_ST_OP_D
| AVG OP Co-payment Amt | |
|---|---|
| ChronicCond_stroke | |
| 1 | 535.947508 |
| 2 | 364.135061 |
CC_ST_all_amts = pd.concat([CC_ST_IP_R, CC_ST_OP_R, CC_ST_IP_D, CC_ST_OP_D], axis=1)
CC_ST_all_amts
| AVG IP Reimbursement Amt | AVG OP Reimbursement Amt | AVG IP Co-payment Amt | AVG OP Co-payment Amt | |
|---|---|---|---|---|
| ChronicCond_stroke | ||||
| 1 | 8111.475260 | 1925.197188 | 861.399489 | 535.947508 |
| 2 | 3278.239134 | 1244.396404 | 360.225326 | 364.135061 |
# Here, I'm displaying the Total Annual Sum of IP Co-payment for 'ChronicCond_stroke'
with plt.style.context('seaborn-poster'):
fig = CC_ST_all_amts.plot(kind='bar', colormap='rainbow')
# Using the "patches" function we will get the location of the rectangle bars from the graph.
## Then by using those location(width & height) values we will add the annotations
for p in fig.patches:
width = p.get_width()
height = p.get_height()
x, y = p.get_xy()
fig.annotate(f'{round(height,0)}', (x + width/2, y + height*1.015), ha='center', fontsize=12, rotation=0)
# Providing the labels and title to the graph
plt.xlabel("\nHaving Chronic ST Disease?", fontdict=label_font_dict)
plt.xticks(ticks=[0,1], labels=['YES', 'NO'], fontsize=13, rotation=30)
plt.ylabel("Total Annual Sum \n", fontdict=label_font_dict)
plt.grid(which='major', linestyle="-.", color='lightgrey')
plt.minorticks_on()
plt.title("Total Annual Sum of various amounts : 'ChronicCond_stroke'\n", fontdict=title_font_dict)
# 1 means +ve with Chronic ST Disease
# 2 means -ve with Chronic ST Disease
OBSERVATION# Number of beneficiaries with chronic or no-chronic conditions
pd.DataFrame(train_bene_df.groupby(['RenalDiseaseIndicator'])['BeneID'].count())
| BeneID | |
|---|---|
| RenalDiseaseIndicator | |
| 0 | 118978 |
| Y | 19578 |
RKD_IP_R = pd.DataFrame(train_bene_df.groupby(['RenalDiseaseIndicator'])['IPAnnualReimbursementAmt'].sum() / train_bene_df.groupby(['RenalDiseaseIndicator'])['BeneID'].count())
RKD_IP_R.columns = ['AVG IP Reimbursement Amt']
RKD_IP_R
| AVG IP Reimbursement Amt | |
|---|---|
| RenalDiseaseIndicator | |
| 0 | 3051.430264 |
| Y | 7360.808050 |
RKD_OP_R = pd.DataFrame(train_bene_df.groupby(['RenalDiseaseIndicator'])['OPAnnualReimbursementAmt'].sum() / train_bene_df.groupby(['RenalDiseaseIndicator'])['BeneID'].count())
RKD_OP_R.columns = ['AVG OP Reimbursement Amt']
RKD_OP_R
| AVG OP Reimbursement Amt | |
|---|---|
| RenalDiseaseIndicator | |
| 0 | 1047.351611 |
| Y | 2822.774543 |
RKD_IP_D = pd.DataFrame(train_bene_df.groupby(['RenalDiseaseIndicator'])['IPAnnualDeductibleAmt'].sum() / train_bene_df.groupby(['RenalDiseaseIndicator'])['BeneID'].count())
RKD_IP_D.columns = ['AVG IP Co-payment Amt']
RKD_IP_D
| AVG IP Co-payment Amt | |
|---|---|
| RenalDiseaseIndicator | |
| 0 | 342.384239 |
| Y | 749.057616 |
RKD_OP_D = pd.DataFrame(train_bene_df.groupby(['RenalDiseaseIndicator'])['OPAnnualDeductibleAmt'].sum() / train_bene_df.groupby(['RenalDiseaseIndicator'])['BeneID'].count())
RKD_OP_D.columns = ['AVG OP Co-payment Amt']
RKD_OP_D
| AVG OP Co-payment Amt | |
|---|---|
| RenalDiseaseIndicator | |
| 0 | 311.659139 |
| Y | 779.167944 |
RKD_all_amts = pd.concat([RKD_IP_R, RKD_OP_R, RKD_IP_D, RKD_OP_D], axis=1)
RKD_all_amts
| AVG IP Reimbursement Amt | AVG OP Reimbursement Amt | AVG IP Co-payment Amt | AVG OP Co-payment Amt | |
|---|---|---|---|---|
| RenalDiseaseIndicator | ||||
| 0 | 3051.430264 | 1047.351611 | 342.384239 | 311.659139 |
| Y | 7360.808050 | 2822.774543 | 749.057616 | 779.167944 |
# Here, I'm displaying the Total Annual Sum of IP Co-payment for 'RenalDiseaseIndicator'
with plt.style.context('seaborn-poster'):
fig = RKD_all_amts.plot(kind='bar', colormap='rainbow')
# Using the "patches" function we will get the location of the rectangle bars from the graph.
## Then by using those location(width & height) values we will add the annotations
for p in fig.patches:
width = p.get_width()
height = p.get_height()
x, y = p.get_xy()
fig.annotate(f'{round(height,0)}', (x + width/2, y + height*1.015), ha='center', fontsize=12, rotation=0)
# Providing the labels and title to the graph
plt.xlabel("\nHaving Chronic Renal Kidney Disease?", fontdict=label_font_dict)
plt.xticks(ticks=[0,1], labels=['NO', 'YES'], fontsize=13, rotation=30)
plt.ylabel("Total Annual Sum \n", fontdict=label_font_dict)
plt.grid(which='major', linestyle="-.", color='lightgrey')
plt.minorticks_on()
plt.title("Total Annual Sum of various amounts : 'RenalDiseaseIndicator'\n", fontdict=title_font_dict)
# Y means +ve with Renal Kidney Disease
# 0 means -ve with Renal Kidney Disease
OBSERVATIONdef cal_display_percentiles(x_col, y_col, title_lbl, x_filter_code):
"""
Description : This function is created for calculating and generating the percentiles for pre-disease indicators.
Input: It accepts below parameters:
1. x_col : Disease indicator feature name.
2. y_col : Feature like re-imbursement or deductible amount whose percentiles you want to generate.
3. title_lbl : Label to be provided in the title of the plot.
4. x_filter_code : Category code for which you want to generate the percentiles.
Output: It returns the dataframe having percentiles and their respective values for the specific disease indicator feature.
And, it displays the pointplot graph of the same.
"""
percentiles = []
percentiles_vals = []
# Calculating & storing the various percentiles and their respective values
for val in [0.1,0.2,0.25,0.3,0.4,0.5,0.6,0.7,0.75,0.8,0.9,0.91,0.92,0.93,0.94,0.95,0.96,0.97,0.98,0.99,0.999,0.9999,0.99999,0.999999,1.0]:
percentile = round(float(val*100),6)
percentiles.append(percentile)
percentile_val = round(train_bene_df[train_bene_df[x_col] == x_filter_code][y_col].quantile(val),1)
percentiles_vals.append(percentile_val)
# Creating the temp dataframe for displaying the results
tmp_percentiles = pd.DataFrame([percentiles, percentiles_vals]).T
tmp_percentiles.columns = ['Percentiles', 'Values']
# Here, I'm displaying the Percentiles values for all disease code features
with plt.style.context('seaborn-poster'):
plt.figure(figsize=(15,7))
sns.pointplot(data=tmp_percentiles, x='Percentiles', y='Values', markers="o", palette='spring')
sns.pointplot(data=tmp_percentiles, x='Percentiles', y='Values', markers="", color='grey', linestyles="solid")
# Providing the labels and title to the graph
plt.xlabel("\nPercentiles", fontdict=label_font_dict)
plt.xticks(rotation=90, size=12)
plt.ylabel("Total Annual `{}` Sum \n".format(y_col), fontdict=label_font_dict)
plt.grid(which='major', linestyle="-.", color='lightpink')
plt.minorticks_on()
plt.title("Percentile values of `{}` :: `{}`\n".format(y_col,title_lbl), fontdict=title_font_dict)
return tmp_percentiles
RKD_YES_IP_R_percentiles = cal_display_percentiles(x_col='RenalDiseaseIndicator',
y_col='IPAnnualReimbursementAmt',
title_lbl="Renal Kidney Disease = YES",
x_filter_code='Y')
OBSERVATIONRKD_NO_IP_R_percentiles = cal_display_percentiles(x_col='RenalDiseaseIndicator',
y_col='IPAnnualReimbursementAmt',
title_lbl="Renal Kidney Disease = NO",
x_filter_code='0')
OBSERVATIONRKD_YES_OP_R_percentiles = cal_display_percentiles(x_col='RenalDiseaseIndicator',
y_col='OPAnnualReimbursementAmt',
title_lbl="Renal Kidney Disease = YES",
x_filter_code='Y')
OBSERVATIONRKD_NO_OP_R_percentiles = cal_display_percentiles(x_col='RenalDiseaseIndicator',
y_col='OPAnnualReimbursementAmt',
title_lbl="Renal Kidney Disease = NO",
x_filter_code='0')
OBSERVATIONCC_ST_YES_IP_R_percentiles = cal_display_percentiles(x_col='ChronicCond_stroke',
y_col='IPAnnualReimbursementAmt',
title_lbl="ChronicCond_stroke = YES",
x_filter_code=1)
OBSERVATIONCC_ST_NO_IP_R_percentiles = cal_display_percentiles(x_col='ChronicCond_stroke',
y_col='IPAnnualReimbursementAmt',
title_lbl="ChronicCond_stroke = NO",
x_filter_code=2)
OBSERVATIONCC_ST_YES_OP_R_percentiles = cal_display_percentiles(x_col='ChronicCond_stroke',
y_col='OPAnnualReimbursementAmt',
title_lbl="ChronicCond_stroke = YES",
x_filter_code=1)
OBSERVATIONCC_ST_NO_OP_R_percentiles = cal_display_percentiles(x_col='ChronicCond_stroke',
y_col='OPAnnualReimbursementAmt',
title_lbl="ChronicCond_stroke = NO",
x_filter_code=2)
OBSERVATIONdef plot_strip_plots(x_col, hue_col, y_col, lgd_title):
"""
Description : This function is created for plotting the spread of data points of pre-disease indicators for the Annual IP and OP expenditures
across males and females.
Input: It accepts below parameters:
1. x_col : Gender feature.
2. hue_Col : Pre-Disease indicator
3. y_col : Feature like re-imbursement or deductible amount whose percentiles you want to generate.
4. lgd_title : Category code for which you want to generate the data spread.
Output: It displays the stipplot graph of the same.
"""
with plt.style.context('seaborn-poster'):
plt.figure(figsize=(10,7))
sns.stripplot(data=train_bene_df, x=x_col, y=y_col, hue=hue_col, palette='plasma')
# Providing the labels and title to the graph
plt.xlabel("\n{}".format(x_col), fontdict=label_font_dict)
plt.xticks(rotation=90, size=12)
plt.ylabel("{}\n".format(y_col), fontdict=label_font_dict)
plt.grid(which='major', linestyle="-.", color='lightpink')
plt.minorticks_on()
plt.title("Spread of payment paid by payer\n", fontdict=title_font_dict)
plt.legend(loc='upper center',title=lgd_title)
plot_strip_plots(x_col='Gender', hue_col="RenalDiseaseIndicator", y_col='IPAnnualReimbursementAmt', lgd_title="Renal Kidney Disease")
OBSERVATIONtrain_bene_df['OPAnnualReimbursementAmt'].min(), train_bene_df['IPAnnualReimbursementAmt'].min()
(-70, -8000)
train_bene_df['OPAnnualDeductibleAmt'].min(), train_bene_df['IPAnnualDeductibleAmt'].min()
(0, 0)
plot_strip_plots(x_col='Gender', hue_col="RenalDiseaseIndicator", y_col='OPAnnualReimbursementAmt', lgd_title="Renal Kidney Disease")
OBSERVATIONplot_strip_plots(x_col='Gender', hue_col="ChronicCond_rheumatoidarthritis", y_col='IPAnnualReimbursementAmt', lgd_title="Rheumatoidarthritis")
OBSERVATIONplot_strip_plots(x_col='Gender', hue_col="ChronicCond_rheumatoidarthritis", y_col='OPAnnualReimbursementAmt', lgd_title="Rheumatoidarthritis")
OBSERVATIONplot_strip_plots(x_col='Gender', hue_col="ChronicCond_IschemicHeart", y_col='IPAnnualReimbursementAmt', lgd_title="Ischemic Heart")
OBSERVATIONplot_strip_plots(x_col='Gender', hue_col="ChronicCond_IschemicHeart", y_col='OPAnnualReimbursementAmt', lgd_title="Ischemic Heart")
OBSERVATIONwith plt.style.context('seaborn-poster'):
plt.figure(figsize=(12,12))
sns.scatterplot(data=train_bene_df, x='AGE', y='IPAnnualReimbursementAmt', hue='Gender', palette='cubehelix')
plt.grid(which='major', linestyle="-.", color='lightpink')
plt.minorticks_on()
plt.title("Spread of payment paid by payer for entire age\n", fontdict=title_font_dict)
OBSERVATIONwith plt.style.context('seaborn-poster'):
plt.figure(figsize=(12,12))
sns.scatterplot(data=train_bene_df, x='AGE', y='OPAnnualReimbursementAmt', hue='Gender', palette='cubehelix')
plt.grid(which='major', linestyle="-.", color='lightpink')
plt.minorticks_on()
plt.title("Spread of payment paid by payer for entire age\n", fontdict=title_font_dict)
OBSERVATIONwith plt.style.context('seaborn-poster'):
plt.figure(figsize=(12,12))
sns.boxenplot(data=train_bene_df, x='AGE_groups', y='IPAnnualReimbursementAmt', hue='Gender', palette='cubehelix')
plt.minorticks_on()
plt.title("Spread of payment paid by payer across age groups\n", fontdict=title_font_dict)
OBSERVATIONwith plt.style.context('seaborn-poster'):
plt.figure(figsize=(12,12))
sns.boxenplot(data=train_bene_df, x='AGE_groups', y='OPAnnualReimbursementAmt', hue='Gender', palette='cubehelix')
plt.minorticks_on()
plt.title("Spread of payment paid by payer across age groups\n", fontdict=title_font_dict)
OBSERVATIONwith plt.style.context('seaborn-poster'):
plt.figure(figsize=(16,12))
sns.boxenplot(data=train_bene_df, x='Patient_Age_Month', y='IPAnnualReimbursementAmt', hue='Gender', palette='cubehelix')
plt.minorticks_on()
plt.title("Spread of payment paid by payer across DOB Months\n", fontdict=title_font_dict)
plt.legend(loc='upper right', title="Gender")
OBSERVATIONwith plt.style.context('seaborn-poster'):
plt.figure(figsize=(16,12))
sns.boxenplot(data=train_bene_df, x='Patient_Age_Month', y='OPAnnualReimbursementAmt', hue='Gender', palette='cubehelix')
plt.minorticks_on()
plt.title("Spread of payment paid by payer across DOB Months\n", fontdict=title_font_dict)
plt.legend(loc='upper right', title="Gender")
OBSERVATIONwith plt.style.context('seaborn'):
plt.figure(figsize=(16,12))
sns.stripplot(data=train_bene_df, x='Patient_Age_Year', y='IPAnnualReimbursementAmt', hue='Gender', palette='cubehelix')
plt.xticks(rotation=90, fontsize=11)
plt.grid(which='major', linestyle="-.", color='lightpink')
plt.minorticks_on()
plt.title("Spread of payment paid by payer across DOB Years\n", fontdict=title_font_dict)
OBSERVATIONwith plt.style.context('seaborn'):
plt.figure(figsize=(16,12))
sns.stripplot(data=train_bene_df, x='Patient_Age_Year', y='OPAnnualReimbursementAmt', hue='Gender', palette='cubehelix')
plt.xticks(rotation=90, fontsize=11)
plt.grid(which='major', linestyle="-.", color='lightpink')
plt.minorticks_on()
plt.title("Spread of payment paid by payer across DOB Years\n", fontdict=title_font_dict)
OBSERVATIONwith plt.style.context('seaborn-poster'):
plt.figure(figsize=(12,8))
sns.boxenplot(data=train_bene_df, x='Dead_or_Alive', y='IPAnnualReimbursementAmt', hue='Gender', palette='autumn')
plt.minorticks_on()
plt.title("Spread of payment paid by payer based on life status\n", fontdict=title_font_dict)
OBSERVATIONwith plt.style.context('seaborn-poster'):
plt.figure(figsize=(12,8))
sns.boxenplot(data=train_bene_df, x='Dead_or_Alive', y='OPAnnualReimbursementAmt', hue='Gender', palette='autumn')
plt.minorticks_on()
plt.title("Spread of payment paid by payer based on life status\n", fontdict=title_font_dict)
OBSERVATIONwith plt.style.context('seaborn-poster'):
plt.figure(figsize=(12,8))
sns.boxenplot(data=train_bene_df, x='Race', y='IPAnnualReimbursementAmt', hue='Gender', palette='twilight')
plt.minorticks_on()
plt.title("Spread of payment paid by payer based on life status\n", fontdict=title_font_dict)
OBSERVATIONwith plt.style.context('seaborn-poster'):
plt.figure(figsize=(12,8))
sns.boxenplot(data=train_bene_df, x='Race', y='OPAnnualReimbursementAmt', hue='Gender', palette='twilight')
plt.minorticks_on()
plt.title("Spread of payment paid by payer based on life status\n", fontdict=title_font_dict)
OBSERVATIONSUMMARY¶DOB YEARDOB MONTHAGE GROUPSLIFE STATUSHUMAN RACESTATENoOfMonths_PartACovNoOfMonths_PartBCovPre-disease indicators looks like important features based on the initial analysis thus it would interesting to see how much they are useful after adding CLAIMS dataset.Date of Death is also removed from the dataset, as we have already calculated bene age, life status and others out of it.train_bene_df.drop(["NoOfMonths_PartACov", "NoOfMonths_PartBCov"], axis=1, inplace=True)
train_bene_df.shape
(138556, 27)
train_bene_df.head()
| BeneID | DOB | Gender | Race | RenalDiseaseIndicator | State | County | ChronicCond_Alzheimer | ChronicCond_Heartfailure | ChronicCond_KidneyDisease | ChronicCond_Cancer | ChronicCond_ObstrPulmonary | ChronicCond_Depression | ChronicCond_Diabetes | ChronicCond_IschemicHeart | ChronicCond_Osteoporasis | ChronicCond_rheumatoidarthritis | ChronicCond_stroke | IPAnnualReimbursementAmt | IPAnnualDeductibleAmt | OPAnnualReimbursementAmt | OPAnnualDeductibleAmt | Patient_Age_Year | Patient_Age_Month | Dead_or_Alive | AGE | AGE_groups | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | BENE11001 | 1943-01-01 | 1 | 1 | 0 | 39 | 230 | 1 | 2 | 1 | 2 | 2 | 1 | 1 | 1 | 2 | 1 | 1 | 36000 | 3204 | 60 | 70 | 1943 | 1 | 0 | 67.0 | Old |
| 1 | BENE11002 | 1936-09-01 | 0 | 1 | 0 | 39 | 280 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 0 | 0 | 30 | 50 | 1936 | 9 | 0 | 73.3 | Old |
| 2 | BENE11003 | 1936-08-01 | 1 | 1 | 0 | 52 | 590 | 1 | 2 | 2 | 2 | 2 | 2 | 2 | 1 | 2 | 2 | 2 | 0 | 0 | 90 | 40 | 1936 | 8 | 0 | 73.4 | Old |
| 3 | BENE11004 | 1922-07-01 | 1 | 1 | 0 | 39 | 270 | 1 | 1 | 2 | 2 | 2 | 2 | 1 | 1 | 1 | 1 | 2 | 0 | 0 | 1810 | 760 | 1922 | 7 | 0 | 87.5 | Very Old |
| 4 | BENE11005 | 1935-09-01 | 1 | 1 | 0 | 24 | 680 | 2 | 2 | 2 | 2 | 1 | 2 | 1 | 2 | 2 | 2 | 2 | 0 | 0 | 1790 | 1200 | 1935 | 9 | 0 | 74.3 | Old |
train_bene_df.to_csv("train_bene_1.csv")